Data Warehouse Schema : Introduction
- Schema in general terms means “Logical Structure”. So, data warehouse schema describes the logical structure of any data warehouse containing records.
- Also, the concept behind schema of data warehouse is same as that in data bases. Relational data models are used by data bases for their logical structure while data warehouses uses schema for the same purpose.
- The schema in data warehouses are used to get the knowledge of complexity of a structure of data warehouse.
- They are basically the representation of the outer model or the way to logically deduce the results from the figure and these figures are made from combinations of fact tables and dimension tables.
- The conceptual modelling of warehouse comprises of three models. These are:-
- A fact table contains keys to dimension tables which can be referred by using the concept of foreign key.
- A dimension table is one that consists of keys to facts present in fact table and their corresponding attributes.
1. Data Warehouse Schema : The Star Schema
- Simplest structural description of any data warehouse and is the least complex among all the dimension models i.e. Snowflake and Galaxy schema.
- The star schema contains a single fact table that is connected to multiple dimension tables in the form of shape of star.
- The basic structure enables star schema to perform functionalities and can handle only simple data mining queries.
- Each dimension table contains information of attributes present in fact table.
Advantages : Star Schema
- Highly optimized performance.
- Applicable on both large scale data sources(such as databases) as well as small scale data sources(such as data marts).
- Star schema is simple and easy to maintain.
Disadvantages : Star Schema
- Lesser accuracy and consistency.
- De-normalized data.
- Data Redundancy.
2. Data Warehouse Schema : The Snowflake Schema
- The snowflake schema describes the logical structure in much more detail as compared to star schema. Snowflake schema is more complex than Star schema but less complex than Galaxy(Fact constellation) Schema.
- The major difference between snowflake and star schema is, star schema contains data in the form of fact tables which are not normalized but in case of snowflake schema, data is normalized.
- This schema is called as snowflake because it portrays the shape similar that of a snowflake with fact table connected to multiple dimension tables that are drawn out from other dimension tables. This results in more use of joins resulting in performance throttling.
Advantages : Snowflake Schema
- Lesser data redundancy.
- Normalized data usage.
- More accurate and consistent than star schema.
Disadvantages : Snowflake Schema
- Slower as compared to star schema due to use of joins.
- More complex than star schema.
- More complex queries are required because of use of joins.
3. Data Warehouse Schema : The Galaxy Schema
- Galaxy schema is also known as fact constellation. Fact constellation refers to combination of fact tables and dimension tables using joins.
- Multiple star schema are connected together to form galaxy schema.
Advantages : Galaxy Schema
- Highly flexible.
- No data redundancy.
- Low memory/space required.
Disadvantages : Galaxy Schema
- Complicated design.
- To create, implement and maintain galaxy schema is a tough job.
- More complex queries are required because of higher number of joins used to connect fact and dimension tables.
- Data analysis is difficult because of complex structure.