In our last blog, we tried to understand terminology around data warehouse. But data warehouse is only a central storage system, to build a complete analytical solution there are other components needed which we will be discussing in a while below.
Like we said earlier, the main hindrance to starting a Power BI solution is to extract data from operational systems and putting it in a suitable format for informational applications that run off the data warehouse. ETL is an acronym of Extract, Transform, Load and such tools perform all the conversions, summarizations, key changes, structural changes and condensations needed to transform disparate data into information that can be used by the decision support tools. The functionality includes:
There are tools which are already available for the above operations but they work only with
simpler data extracts. Frequently, customized extract routines need to be developed for the
more complicated data extraction procedures.
Meta data is data about data that describes the data warehouse. It is used for building, maintaining, managing and using the data warehouse.
The role of metadata in a warehouse is different from the warehouse data, yet it plays an important role. The various roles of metadata are explained below:
Data mart is an implementation of the data warehouse with a scope of content and data warehouse functions. Usually, data mart is restricted to a single department or part of the organization.
Some of the complexities inherent in data warehouses are usually not present in data-mart-oriented projects. Data architecture modeling, for instance, which is a crucial technique for data warehouse development, is far less required for data marts.
Modeling for the data mart is more end-user focused than modeling for a data warehouse.
The primary advantages are:
One of the key to success in building a scalable Data warehouse solution is iterative approach. Data warehouse development can easily be bogged down if scope is too broad. The simplest way to scope data warehouse development is building one data mart at a time. Each data mart supports a single organizational element and the scope of development is limited by the data mart requirements.
Initial data mart usually acts as the Data warehouse proof-of- concept, the scope must be restricted and sufficient to provide immediate and real benefits. Over a period of time, additional data marts can be developed and integrated as enterprise needs and resource availability.
Different phases of development are:
Determine hardware and software platforms
There are certain questions to be answered before correct hardware and software platform can be identified like:
Based on above information, the right set of decisions and procurement can be made.
Develop source integration and data transformation platforms
Integration and transformation programs are necessary to extract information from operational systems and databases for initial and subsequent loads. There could be a separate program needed to do the initial load depending upon the data volume and complexity.
Update programs are usually simpler and just pick the changed data since the last run. Over time, update programs will be changed to reflect changes in both operational and other data sources.
Develop data security policies and procedures
A data warehouse system is a read-only source of enterprise information, so there is no need to be concerned of creating, update, delete capabilities. Though there is a need to address the tradeoff between protecting the corporate assets against unauthorized access and making data accessible to users who need the decision power.
Security also comprises of physical security of the data warehouse. Data must be secured from loss, damage and enough redundancy and backup measures must be in place.
Data availability is one side of the coin but how easily and effectively a user can perceive that data is what defines the usability of such a solution.
To provide ease of data consumption, graphical views are preferred by most of the organizations. For performance, developers must ensure that platform supports and are optimized to cater to end-user needs.
There must be an option for the user to drill down and roll up to get more detailed views as well as a view of the overall health of the organization.
There is a number of tools that can be integrated with the existing data warehouse to get all such capabilities while certain specific and customized views will need manual development.
Find out, How India’s fastest-growing logistics company is driving key business decisions by running
real-time reporting on top of Redshift.
This blog has been written by Aashu Aggarwal, one of our core Leadership Team's Member.
Aashu is one of the rare sales people who’s still an enthusiastic coder at heart! She’s proficient at traditional goodies like Java, J2EE, Spring, RDBMS; along with new age technologies like MongoDB, Scala, Play, NodeJS, AngularJS, etc.
Often guided by the Mantra "Out of clutter, find simplicity", Aashu prefers employing simplistic approaches to wade through tough scenarios. When not hooked on to work (we don’t know when was the last time that happened), she likes to travel the world and capture it through the lens of her camera!