Menu Icon Close
Cloud

Guide to Redshift Remodeling- Friends and Foes

Featured Image of Redshift Blog
May 04, 2017
By Blue Pi

Reading Time: 3 minutes

In this series, we make an attempt to chronicle our experience and best practices with redshift having used it in ‘anger’ in many projects. In this part 1 of the series, we look for appropriate schema design for redshift, the various alternatives and the pros and cons of each.

Schema Design

Proper dimensional model is an absolute need for Redshift to perform well. Three different dimensional models work best with Redshift:

  • The Snowflake
  • The Star
  • Flat

To understand the differences between the three modeling schemes let’s get some terminology out of the way.
There are two types of tables that we create in a dimensional model Facts and Dimensions. The fact tables are the measures, while the dimension tables contain descriptive attributes. So, the facts usually have numbers and we use some numeric operations like sum, count, average on them. On the contrary, the dimensions are used to filter or categorize the data. The number of unique, in a dimension, would always be an order of magnitude smaller in comparison to the number of facts.
The difference between Star and Snowflake Schema manifest in the ‘normalization’, we do in the dimensions. In the Snowflake schema, the dimension tables are normalized iteratively whereas in Star there is only one level of normalization.
Examples
So, let’s visualize this with two tables Sales fact and Store dimension table. The Sales fact stores the amount of product sold from each store.

Star Schema

Here store and product are dimension tables while sales are the fact table.
Data in dimension tables do not change frequently and as mentioned above, contains the descriptive attributes. Here, store table stores the outlet information while the product is the collection of all types of products produced by the company.
Fact table sales, rather stores every transaction or sale of a product made by the individual store, so it is highly dynamic in nature and usually the heaviest one as well. Note all the columns in sales are of integer type, which makes it easy to do aggregations, indexing for queries to run faster.
Also, fact tables must be distributed across different redshift clusters by either stores or products to gain huge improvements in query performance.

Snowflake Schema

Do you see a problem with Star Schema? There is one, dimension table “store” is not normalized. That means if there are 1000 stores in India, country ‘India’ would be repeated 1000 times. Same is the case with city and state. This makes dimension tables unnecessary bulkier.
While denormalizing extensively increases the number of joins to fetch the data, which adds to computing time and adds to query execution.

Flat Schema

There exists another model i.e. flat model, which is essentially another level of denormalization over the star schema. The state, city, country and store all get folded into the fact table.
Here are the pros and cons of the three modeling choices.

### The three approaches compared

Type Normalization Ease of Query Storage size Joins
Snowflake High Difficult Low Many
Star Medium Medium Medium One
Flat Low Easy High None

 

Recommendations

If you are using Redshift for data marts where query performance is critical the right choice is a snowflake schema. if you are using redshift as generic data warehouse it is preferred to build a star schema.
To know more about such cutting-edge technology, interesting use cases and deep insights to dwell upon and transform your business growth, reach out to us at info@bluepi.in
Till then, keep innovating!
This blog has been written by Mr. Pronam Chatterjee.

Mr.Pronam Chatterjee, CEO

As founder and CEO of BluePi Consulting, Pronam Chatterjee has been instrumental in driving the vision and strategy of the company. Pronam firmly believes that cutting-edge technology should be used to solve complex, real-world problems. He has an eye for spotting the big technological waves early and leveraging them, much before rest of the crowd wakes up to those.
A techie at heart, he started his career with the PWC, India, as a consultant on architecture and software development; and now has close to 20 years of experience working across companies like ACS, Sapient, Polaris, AON Hewitt, VMWare, etc. He was the CTO of Xerox Products in India, before founding BluePi in 2013.

TAGS :

Similar Blogs


Blog: What is Azure HDInsight
Cloud February 07, 2019
3 Min Read
Know More
What is Amazon Athena blog
Cloud February 01, 2019
3 Min Read
Know More

Invest in better
solutions today

Contact Us