Language:EN
Pages: 154
Rating : ⭐⭐⭐⭐⭐
Price: $10.99
Page 1 Preview
data warehouse architecture principles data wareho

Data warehouse architecture principles data warehousing

Basic Concepts of Data Warehousing
Introduction, Meaning and characteristics of Data Warehousing, Online Transaction Processing (OLTP), Data Warehousing Models, Data warehouse architecture & Principles of Data Warehousing Data Mining.

Building a Data Warehouse Project
Structure of the Data warehouse, Data warehousing and Operational Systems, Organizing for building data warehousing, Important considerations – Tighter integration, Empowerment, Willingness Business Considerations: Return on Investment Design Considerations, Technical Consideration, Implementation Consideration, Benefits of Data warehousing.

Suggested Readings:

1. Pieter Adriaans, Dolf Zantinge Data Mining, Pearson Education
2. George M. Marakas Modern Data Warehousing, Mining, and Visualization: Core Concepts, Prentice Hall, 1st edition
3. Alex Berson, Stephen J. Smith Data Warehousing, Data Mining, and OLAP (Data Warehousing/Data Management), McGraw-Hill
4. Margaret H. Dunham Data Mining, Prentice Hall, 1st edition,
5. David J. Hand Principles of Data Mining (Adaptive Computation and Machine Learning), Prentice Hall, 1st edition
6. Jiawei Han, Micheline Kamber Data Mining, Prentice Hall, 1st edition
7. Michael J. Corey, Michael Abbey, Ben Taub, Ian Abramson Oracle 8i Data Warehousing McGraw-Hill Osborne Media, 2nd edition

If we look at the evolution of the information processing

sional verses Multirelational OLAP, OLAP Operations and Categorization of OLAP Tools: MOLAP and ROLAP.

nologies have emerged that are focused on improving the Objective

information content of the data to empower the knowledge workers of today and tomorrow. Among these technologies are data warehousing, online analytical processing (OLAP), and data

information age. employees, most organizations process an enormous number

From that perspective, this book is intended to become the handbook and guide for anybody who’s interested in planning, or working on data warehousing and related issues. Meaning and characteristics of Data Warehousing, Data Warehousing Models, Data warehouse architecture & Principles of Data Warehousing, topics related to building a data warehouse
project are discussed along with Managing and implementing a data warehouse project. Using these topics as a foundation, this book proceeds to analyze various important concepts related to Data mining, Techniques of data mining, Need for OLAP, OLAP vs. OLTP, Multidimensional data model, Multidimen-

The newest, hottest technology to address these concerns is data mining. Data mining uses sophisticated statistical analysis and modeling techniques to uncover pattern and relationships
hidden in organizational databases – patterns that ordinary

However, the very size and complexity of data warehouses methods might miss.

ii

Data Warehousing

Lesson 1

Introduction to Data Warehousing

1
Lesson 2
5
Lesson 3
9
Lesson 4
13
Lesson 5
19

Building a Data Warehouse Project

Managing and Implementing a Data Warehouse Project

Lesson 12
48
Lesson 13
52
Lesson 14
55
Lesson 15 Managing Risk: Internal and External, Critical Path Analysis 59

Data Mining Techniques

Lesson 22

Various Techniques of Data Mining Nearest Neighbor

93
Lesson 23
98

CONTENT

Lesson No. Topic Page No.

OLAP

Objective
The main objective of this lesson is to introduce you with the basic concept and terminology relating to Data Warehousing.

By the end of this lesson you will be able to understand:• Meaning of a Data warehouse
• Evolution of Data warehouse

communication requirements it is possible to incorporate additional or expert information it is.

The logical link between what the managers see in their decision Support EIS application and the company’s operational
activities Johan McIntyre of SAS institute Inc.

To illustrate the danger of being information under loaded, consider the children’s story of the country mouse is unable to cope with and environment its does not understand.

What is a cat? Is it friend or foe?

understand what a data warehouse is and what it is not. You will learn what human resources are required, as well as the roles and responsibilities of each player. You will be given an
overview of good project management techniques to help ensure the data warehouse initiative dose not fail due the poor project management. You will learn how to physically imple-ments a data warehouse with some new tools currently available to help you mine those vast amounts of information stored with in the warehouse. Without fine running this ability to mine the warehouse, even the most complete warehouse,
would be useless.

History of Data Warehousing
Let us first review the historical management schemes of the analysis data and the factors that have led to the evolution of the data warehousing application class.

The disadvantage of the above is that it leaves the data frag-mented and oriented towards very specific needs. Each
individual user has obtained only the information that she/he requires. The extracts are unable to address the requirements of multiple users and uses. The time and cost involved in
addressing the requirements of only one user are large. Due to

the disadvantages faced it led to the development of the new application called Data Warehousing

Server software: Server software is inexpensive, powerful, and easy to maintain as compared to that of the past.

Example of this is Windows NT that have made setup of powerful systems very easy as well as reduced the cost.

2

Discussions
• Write short notes on:
• Legacy systems
• Data warehouse
• Standard Business Applications
• What is a Data warehouse? How does it differ from a database?

3. Berry, Michael J.A. ; Linoff, Gordon, Mastering data mining : the art and science of customer relationship management, New York : John Wiley & Sons, 2000

4. Corey, Michael, Oracle8 data warehousing, New Delhi: Tata McGraw- Hill Publishing, 1998.

CHAPTER 1: DATA WAREHOUSING LESSON 2
MEANING AND CHARACTERISTICS OF DATA WAREHOUSING

Structure
• Objective
• Introduction
• Data warehousing
• Operational vs. Informational Systems• Characteristics of Data warehousing• Subject oriented
• Integrated
• Time variant
• Non-volatiles

“A data warehouse is a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in a business context.” (Devlin 1997)

“Data warehousing is a process, not a product, for assembling and managing data from various sources for the purpose of gaining a single, detailed view of part or all of the business.”(Gardner 1998)
A Data Warehouse is a capability that provides comprehensive and high integrity data in forms suitable for decision support to end users and decision makers throughout the organization. A data warehouse is managed data situated after and outside the operational systems. A complete definition requires discussion of many key attributes of a data warehouse system Data
Warehousing has been the result of the repeated attempts of various researchers and organizations to provide their organiza-tions flexible, effective and efficient means of getting at the valuable sets of data.

On the other hand, there are other functions that go on within the enterprise that have to do with planning, forecasting and managing the organization. These functions are also critical to the survival of the organization, especially in our current fast paced world. Functions like “marketing planning”, “engineering planning” and “financial analysis” also require information systems to support them. But these functions are different from operational ones, and the types of systems and informa-tion required are also different. The knowledge-based functions are informational systems.

“Informational systems” have to do with analyzing data and making decisions, often major decisions about how the
enterprise will operate, now and in the future. And not only do informational systems have a different focus from operational ones, they often have a different scope. Where operational data needs are normally focused upon a single area, informational data needs often span a number of different areas and need large amounts of related operational data.

Traditional databases support On-Line Transaction Processing (OLTP), which includes insertions, updates, and deletions, while also supporting information query requirements.

Traditional relational databases are optimized to process queries that may touch a small part of the database and transactions that deal with insertions or updates of a few tuples per relation to process. Thus, they cannot be optimized for OLAP, DSS, or data mining. By contrast, data warehouses are designed precisely to support efficient extraction, process-ing, and presentation for analytic and decision-making purposes. In comparison to tradi-tional databases, data warehouses generally contain very large amounts of data from multiple sources that may include
databases from different data models and sometimes lies

W. H. Inmon characterized a data warehouse as “a subject-oriented, integrated, nonvola-tile, time-variant collection of data in support of management’s decisions.” Data ware-houses provide access to data for complex analysis, knowledge discov-ery, and decision-making.

Subject Oriented
Data are organized according to subject instead of application e.g. an insurance company using a data warehouse would organize their data by costumer, premium, and claim, instead of by different products (auto. Life etc.). The data organized by

data warehouse, but are only loaded and accessed.

Data warehouses have the following distinctive characteristics.• Multidimensional conceptual view.

• Accessibility.

• Transparency.

• Virtual data warehouses provide views of operational databases that are materialized for efficient access.

• Data marts generally are targeted to a subset of the organization, such as a dependent, and are more tightly focused.

Integrated
• Constructed by integrating multiple, heterogeneous data sources as relational databases, flat files, on-line transaction records.

• Providing data cleaning and data integration techniques.

• Requires only two operations in data accessing: initial loading of data and access of data (no data updates).

Discussions
• Write short notes on:
• Metadata
• Operational systems
• OLAP
• DSS
• Informational Systems
• What is the need of a Data warehouse in any organization?

2. Anahory, Sam, Data warehousing in the real world: a practical guide for building decision support systems, Delhi: Pearson Education Asia, 1997.

3. Berry, Michael J.A. ; Linoff, Gordon, Mastering data mining : the art and science of customer relationship management, New York : John Wiley & Sons, 2000

7

8

Objective
The main objective of this lesson is to introduce you with Online Transaction Processing. You will learn about the importance and advantages of an OLTP system.

Introduction
Relational databases are used in the areas of operations and control with emphasis on transaction processing. Recently
relational databases are used for building data warehouses, which stores tactical information (<1year into the future) that answers who and what questions. In contrast OLAP uses MD views of aggregate data to provide access strategic information. OLAP enables users to gain insight to a wide variety of possible views of information and transforms raw data to reflect the enterprise as understood by the user e.g., Analysts, managers and executives.

OLTP Data Warehouse
Purpose

Information retrieval and analysis

Structure

RDBMS

Data Model
Access

SQL plus data analysis extensions

Type of Data

Historical descriptive

The data warehouse server a different purpose from that of OLTP systems by allowing business analysis queries to be answered as opposed to “simple aggregation” such as ‘what is the current account balance for this customer?’ Typical data warehouse queries include such things as ‘which product line sells best in middle America and how dose this correlate to demographic data?

• Mapping from the operational environment to the data warehouse.

Data cleansing is an important viewpoint of creating an efficient data warehouse of creating an efficient data warehouse in that is the removal of creation aspects Operational data such as low level transaction information which sloe down the query times. The cleansing stage has to be as dynamic as possible to accom-modate all types of queries even those, which may require low-level information. Data should be extracted from produc-tion sources at regular interval and pooled centrally but the cleansing process has to remove duplication and reconcile
differences between various styles of data collection.

What is OLAP?

• Relational databases are used in the areas of operations and control with emphasis on transaction processing.

• Thus OLAP enables strategic decision-making.

• OLAP calculations are more complex than simply summing data.

Budgeting

Activity-based costing

Market research analysis

Promotion analysis

Thus, OLAP must provide managers with the information they need for effective decision-making. The KPI (key performance indicator) of an OLAP application is to provide just-in-time (JIT) information for effective decision-making. JIT informa-tion reflects complex data relationships and is calculated on the fly. Such an approach is only practical if the response times are always short The data model must be flexible and respond to changing business requirements as needed for effective decision making.

In order to achieve this in widely divergent functional areas OLAP applications all require:

10

• MD views provide the foundation for analytical processing through flexible access to information.

More complex calculations are performed on other
dimensions
• Ratios and averages
• Variances on sceneries
• A complex model to compute forecasts
• Consistently quick response times to these queries are
imperative to establish a server’s ability to provide MD views of information.

Benefits of OLAP
• Increase the productivity of manager’s developers and whole organisations.

• Thus, OLAP enables organisations as a whole to respond more quickly to market demands, which often results in increased revenue and profitability. The goal of every organisation.

Discussions
• Write short notes on:
• Multi-Dimensional Views
• Operational Systems
• What is the significance of an OLTP System?

• “The KPI (key performance indicator) of an OLAP
application is to provide just-in-time (JIT) information for effective decision-making”. Explain.

References

Introduction
Data warehousing is the process of extracting and transforming operational data into informational data and loading it into a central data store or warehouse. Once the data is loaded it is accessible via desktop query and analysis tools by the decision makers.

The Data Warehouse Model
The data warehouse model is illustrated in the following diagram.

Figure 2: The structure of data inside the data warehouse

The current detail data is central in importance as it:
• Reflects the most recent happenings, which are usually the most interesting;
• It is voluminous as it is stored at the lowest level of
granularity;
• it is always (almost) stored on disk storage which is fast to access but expensive and complex to manage.

13

Data Modeling for Data Warehouses Multidimensional models take advantage of inherent relationships in data to populate data in multidimensional matrices called data cubes. (These may be called hypercube if they have more than three dimensions.) For data that lend themselves to dimensional Formatting, query performance in multidimensional matrices can be much better than in the relational data model. Three examples of dimensions in a corporate data warehouse would be the corporation’s fiscal periods, products, and regions.

A standard spreadsheet is a two-dimensional matrix. One example would be a spreadsheet of regional sales by product for a particular time period. Products could be shown as rows, with sales revenues for each region comprising the columns.

Roll-up display moves up the hierar-chy, grouping into larger units along a dimension (e.g., summing weekly data by quar-ter, or by year). One of the above figures shows a roll-up display that moves from individual products to a coarser grain of product categories.

• A drill-down display pro-vides the opposite capability, furnishing a finer-grained view, perhaps disaggregating country sales by region and then regional sales by sub region and also breaking up prod-ucts by styles.

You are viewing 1/3rd of the document.Purchase the document to get full access instantly

Immediately available after payment
Both online and downloadable
No strings attached
How It Works
Login account
Login Your Account
Place in cart
Add to Cart
send in the money
Make payment
Document download
Download File
img

Uploaded by : Dawn Simmons

PageId: DOCE81CF05