Time dimension: Star Schema or Snowflake?
I am preparing the design of a DW for use with Pentaho, and to revise the definition of the time dimension I have some doubts. I explain:
DW is a sales analysis. In the model I have two fact tables, one for sales, whose granularity is at the level of day, customer, product, etc, and a fact table which records the information in the sales forecast (this table has a level of different granularity, where a month and level of client channel (which is one of the attributes of the customer dimension)). The keys to the fact table are not therefore the keys to the dimension table, but a component within a dimension.
Initial Logical Model
Given this, I rather doubt arise about the best design choice to build the database:
1) Having made tables on a different level of granularity, is it necessary or advisable to spend the dimensions involved in a star schema to work properly after the consultation?.
2) What may be the case that aspects of design such as this are conditioned by the tool we will use later?. For example, if I will use Pentaho, or Microstrategy, etc, can be the case that using a tool or another is determining how to model in certain cases?. I have read in a blog for example, Microstrategy recommends switching to a snowflake schema (including the examples provided with the platform, the Analytic Modules, are built that way).
3) Could we generalize that a simple scheme to use a scheme smoothly total star, but at the time that complicate the model with more fact tables and different granularity is better to spend a snowflake?. "To follow the Kimball methodology in terms of the scale formed is also necessary to do that, and so we can address the complex models with multiple fact tables?.
I hope I throw a cable, especially those who have been visiting the construction of DW. It is an issue for me is a bit confusing (besides the examples provided in star schemas are used to illustrate simple, then do not correspond with reality more complex).
Rolling even more history, look like Microstrategy implements the time dimension in Analytic Modules, as I said before:
Not exactly a snowflake and the date we reach the month, quarter or year (it is a cyclic graph).
In the Blog The Mine Digital is an interesting entry about the dilemma. The funny thing is that we clearly no, but gives us some premises may be useful for the decision.
If you like you can contact me at email and I can gladly help you with questions you have.
Then if you like you can comment.
After thinking a lot about, reread in reference books and find information, I think I have understood and a solution would be:
Time dimension would have the following structure, including all the usual attributes of a time dimension:
As we will have a fact table at the day (level of granularity) and two fact tables at the month (different level), need to have a dimension table for the month. To solve this problem, we have two options:
- Construction of a scale derived or roll-up: create a subset of the original dimension, taking us all the attributes that affect the level of granularity or higher levels (for example, for the month, we get also the quarter and year). It is a strict subset of the time dimension. Would look like:
Dimension for the month resulting in the time dimension
- Normalize the size and pass it to snowflake: this would be another option. In the picture below, you can see how the scheme would be the case we would have decided to use this solution. As you can see, we have created a dimension for the month (the hospital is needed for monthly fact tables) and moreover we have also brought out the quarters and years (for the case that in the future have other types of analysis that required).
Logic Model - Dimension Standard Time
For our case, we are left with option 1, which is recommended by Kimball. As we see, we've created a set or subset of the time dimension, and all the attributes with the same names in both the original scale as derived dimension. With this philosophy we can build conformed dimensions that help us to implement the concept of data warehouse Bus. The technique can be used for any case we have a different level of granularity. Could also be used in case we have a datamart where we get the dimension with the same structure of attributes but limit the number of values (imagine that we have several channels of sale and a specific analysis, only the scale we get values derived from a customer of these channels).
Dimension Concept Roll-up or derived
For the construction of the new dimension, we may use a standard view or a materialized view.
I think this answered the questions raised. It is not necessary to pass the standard model when the problem of boards made with different granularity, and also the described method we can use whenever a similar scenario arises.
Thanks for sharing this post. This is a very helpful and informative material. Good post and keep it up. That’s cool stuff, anyways, a good way to get started to renovate your dreams into the world of reality. Adidas Adizero Adios
It is actually really great to hear that you've actually started on the design of a DW for use with Pentaho. Which actually helps us to revise the definition of the time dimension. This could actually prove to be one of the greatest innovatie achivements in the coustumer care and marketing department. I hope you succeed on what you are up to. Great wishes, hope you bring in great positive changes.