top of page
Search

Data Warehousing.

  • Writer: shubham samanta
    shubham samanta
  • Nov 30, 2019
  • 8 min read

What is Data Warehousing?

Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations .


Using Data Warehouse Information

There are decision support technologies that help utilize the data available in a data warehouse. These technologies help executives to use the warehouse quickly and effectively. They can gather data, analyze it, and take decisions based on the information present in the warehouse. The information gathered in a warehouse can be used in any of the following domains −

Tuning Production Strategies − The product strategies can be well tuned by repositioning the products and managing the product portfolios by comparing the sales quarterly or yearly.

Customer Analysis − Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Operations Analysis − Data warehousing also helps in customer relationship management, and making environmental corrections. The information also allows us to analyze business operations.


Integrating Heterogeneous Databases

To integrate heterogeneous databases, we have two approaches −

Query-driven Approach

Update-driven Approach


Query-Driven Approach

This is the traditional approach to integrate heterogeneous databases. This approach was used to build wrappers and integrators on top of multiple heterogeneous databases. These integrators are also known as mediators.


Process of Query-Driven Approach

When a query is issued to a client side, a metadata dictionary translates the query into an appropriate form for individual heterogeneous sites involved.

Now these queries are mapped and sent to the local query processor.

The results from heterogeneous sites are integrated into a global answer set.

Disadvantages

Query-driven approach needs complex integration and filtering processes.

This approach is very inefficient.

It is very expensive for frequent queries.

This approach is also very expensive for queries that require aggregations.


Update-Driven Approach

This is an alternative to the traditional approach. Today's data warehouse systems follow update-driven approach rather than the traditional approach discussed earlier. In update-driven approach, the information from multiple heterogeneous sources are integrated in advance and are stored in a warehouse. This information is available for direct querying and analysis.


Advantages

This approach has the following advantages −

This approach provide high performance.

The data is copied, processed, integrated, annotated, summarized and restructured in semantic data store in advance.

Query processing does not require an interface to process data at local sources.


Functions of Data Warehouse Tools and Utilities

The following are the functions of data warehouse tools and utilities −

Data Extraction − Involves gathering data from multiple heterogeneous sources.

Data Cleaning − Involves finding and correcting the errors in data.

Data Transformation − Involves converting the data from legacy format to warehouse format.

Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions.

Refreshing − Involves updating from data sources to warehouse.


How Datawarehouse works?

A Data Warehouse works as a central repository where information arrives from one or more data sources. Data flows into a data warehouse from the transactional system and other relational databases.

Data may be:

Structured

Semi-structured

Unstructured data

The data is processed, transformed, and ingested so that users can access the processed data in the Data Warehouse through Business Intelligence tools, SQL clients, and spreadsheets. A data warehouse merges information coming from different sources into one comprehensive database.

By merging all of this information in one place, an organization can analyze its customers more holistically. This helps to ensure that it has considered all the information available. Data warehousing makes data mining possible. Data mining is looking for patterns in the data that may lead to higher sales and profits.


Types of Data Warehouse.

Three main types of Data Warehouses are:

1. Enterprise Data Warehouse:

Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions.

2. Operational Data Store:

Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees.

3. Data Mart:

A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.


General stages of Data Warehouse

Earlier, organizations started relatively simple use of data warehousing. However, over time, more sophisticated use of data warehousing begun.

The following are general stages of use of the data warehouse:

Offline Operational Database:

In this stage, data is just copied from an operational system to another server. In this way, loading, processing, and reporting of the copied data do not impact the operational system's performance.

Offline Data Warehouse:

Data in the Datawarehouse is regularly updated from the Operational Database. The data in Datawarehouse is mapped and transformed to meet the Datawarehouse objectives.

Real time Data Warehouse:

In this stage, Data warehouses are updated whenever any transaction takes place in operational database. For example, Airline or railway booking system.

Integrated Data Warehouse:

In this stage, Data Warehouses are updated continuously when the operational system performs a transaction. The Datawarehouse then generates transactions which are passed back to the operational system.


Components of Data warehouse

Four components of Data Warehouses are:

Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse.

Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data.

Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries.

End-user access tools:

This is categorized into five different groups like

1. Data Reporting

2. Query Tools

3. Application development tools

4. EIS tools,

5. OLAP tools and data mining tools.


Who needs Data warehouse?

Data warehouse is needed for all types of users like:

Decision makers who rely on mass amount of data

Users who use customized, complex processes to obtain information from multiple data sources.

It is also used by the people who want simple technology to access the data

It also essential for those people who want a systematic approach for making decisions.

If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful

.Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings.



What Is a Data Warehouse Used For?

Here, are most common sectors where Data warehouse is used:

Airline:

In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc.

Banking:

It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations.

Healthcare:

Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc.

Public sector:

In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every individual.

Investment and Insurance sector:

In this sector, the warehouses are primarily used to analyze data patterns, customer trends, and to track market movements.

Retain chain:

In retail chains, Data warehouse is widely used for distribution and marketing. It also helps to track items, customer buying pattern, promotions and also used for determining pricing policy.

Telecommunication:

A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions.

Hospitality Industry:

This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns.



Steps to Implement Data Warehouse

The best way to address the business risk associated with a Datawarehouse implementation is to employ a three-prong strategy as below

Enterprise strategy: Here we identify technical including current architecture and tools. We also identify facts, dimensions, and attributes. Data mapping and transformation is also passed.

Phased delivery: Datawarehouse implementation should be phased based on subject areas. Related business entities like booking and billing should be first implemented and then integrated with each other.

Iterative Prototyping: Rather than a big bang approach to implementation, the Datawarehouse should be developed and tested iteratively.


Here, are key steps in Datawarehouse implementation along with its deliverables .


Best practices to implement a Data Warehouse

Decide a plan to test the consistency, accuracy, and integrity of the data.

The data warehouse must be well integrated, well defined and time stamped.

While designing Datawarehouse make sure you use right tool, stick to life cycle, take care about data conflicts and ready to learn you're your mistakes.

Never replace operational systems and reports

Don't spend too much time on extracting, cleaning and loading data.

Ensure to involve all stakeholders including business personnel in Data

warehouse implementation process. Establish that Data warehousing is a joint/ team project. You don't want to create Data warehouse that is not useful to the end users.

Prepare a training plan for the end users.


Why We Need Data Warehouse? Advantages & Disadvantages

Advantages of Data Warehouse:

Data warehouse allows business users to quickly access critical data from some sources all in one place.

Data warehouse provides consistent information on various cross-functional activities. It is also supporting ad-hoc reporting and query.

Data Warehouse helps to integrate many sources of data to reduce stress on the production system.

Data warehouse helps to reduce total turnaround time for analysis and reporting.

Restructuring and Integration make it easier for the user to use for reporting and analysis.

Data warehouse allows users to access critical data from the number of sources in a single place. Therefore, it saves user's time of retrieving data from multiple sources.

Data warehouse stores a large amount of historical data. This helps users to analyze different time periods and trends to make future predictions.


Disadvantages of Data Warehouse:

Not an ideal option for unstructured data.

Creation and Implementation of Data Warehouse is surely time confusing affair.

Data Warehouse can be outdated relatively quickly

Difficult to make changes in data types and ranges, data source schema, indexes, and queries.

The data warehouse may seem easy, but actually, it is too complex for the average users.

Despite best efforts at project management, data warehousing project scope will always increase.

Sometime warehouse users will develop different business rules.

Organisations need to spend lots of their resources for training and Implementation purpose.


The Future of Data Warehousing.

Change in Regulatory constrains may limit the ability to combine source of disparate data. These disparate sources may include unstructured data which is difficult to store.

As the size of the databases grows, the estimates of what constitutes a very large database continue to grow. It is complex to build and run data warehouse systems which are always increasing in size. The hardware and software resources are available today do not allow to keep a large amount of data online.

Multimedia data cannot be easily manipulated as text data, whereas textual information can be retrieved by the relational software available today. This could be a research subject.


Data Warehouse Tools.

There are many Data Warehousing tools are available in the market. Here, are some most prominent one:


1. MarkLogic:

MarkLogic is useful data warehousing solution that makes data integration easier and faster using an array of enterprise features. This tool helps to perform very complex search operations. It can query different types of data like documents, relationships, and metadata.


2. Oracle:

Oracle is the industry-leading database. It offers a wide range of choice of data warehouse solutions for both on-premises and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.


3. Amazon RedShift:

Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. It also allows running complex queries against petabytes of structured data, using the technique of query optimization.




Conclusion:

The data warehouse works as a central repository where information is coming from one or more data sources.

Three main types of Data warehouses are Enterprise Data Warehouse, Operational Data Store, and Data Mart.

General state of a datawarehouse are Offline Operational Database, Offline Data Warehouse, Real time Data Warehouse and Integrated Data Warehouse.

Four main components of Datawarehouse are Load manager, Warehouse Manager, Query Manager, End-user access tools

Datawarehouse is used in diverse industries like Airline, Banking, Healthcare, Insurance, Retail etc.

Implementing Datawarehosue is a 3 prong strategy viz. Enterprise strategy, Phased delivery and Iterative Prototyping.

Data warehouse allows business users to quickly access critical data from some sources all in one place.


THANK YOU





 
 
 

Recent Posts

See All

Comments


Post: Blog2_Post

Tech Blogger | @ Shubham Samanta

Subscribe Form

Thanks for submitting!

  • Facebook
  • Twitter
  • LinkedIn

©2018 by Engineering bloggers @Samanta. Proudly created with Wix.com

bottom of page