Data Warehouse Operations : Introduction
- Any data warehouse will consist of random data which will surely be in unstructured manner with a lot of unwanted and dirty data. Dirty data refers to incomplete and noisy data containing errors.
- To make this data structured and noise free, dirty data needs to be removed. This will help in converting data into useful information and can be achieved using certain data warehouse operations. These operations are combination of ETL(Extraction, Transform, Loading) operations along with data cleaning and data refresh operations.
1. Data Cleaning
- In data cleaning, inconsistencies are removed. Also, noisy data containing errors are also rectified.
- For example : Cleaning of redundant(duplicate) data.
2. Data Refresh
- In data refresh operation, data in data warehouse is refreshed by broadcasting the data from multiple sources and updating it on timely basis. This is done because, data inside data bases are updated every minute and to get this same data on data warehouse, the process of refreshing is performed.
3. Extraction of Data
- Data obtained after cleaning and refresh is still unstructured and unorganized. To make it organised and enable user to extract and retrieve relevant data is done through data extraction process. This is helpful, if any user wants to mine the data.
- Data extraction can be classified as:
4. Transformation of data
- Data obtained through heterogeneous data bases have native structure of their respective databases that might be different from that structure of data warehouse. So, transformation of data from heterogeneous database is done to organize data in the structure similar to that of the data warehouse.
5. Data Loading
- Data loading is responsible for loading the data to its respective target data repository that might include data bases, data marts data warehouses etc.