· Web view2011/11/01 · in IBM Systems Journal where they introduce the term...

Post on 15-Mar-2018

218 views 4 download

Transcript of · Web view2011/11/01 · in IBM Systems Journal where they introduce the term...

Datu bāzes datu izmantošana

1. Vaicājumi un atskaites

2. No kopsavilkuma uz detalizētu informāciju (drill-down) un no detalizētas informācijas uz kopsavilkuma informāciju (roll-up analysis) – datu noliktavas tehnoloģijas

3. "Intelektuālu" datu apstrādes algoritmu izmantošana (datizrace (data mining)) lēmumu pieņemšanai

1

Datu noliktava (data warehouse) un datu vitrīna (data mart)

A data warehouse (DW) is a database used for reporting.

A data warehouse is a database specifically structured for query and analysis. A data warehouse typically contains data representing the business history of an organization.

Datu noliktavas ir dažādu uzņēmumu uzņēmējdarbības sistēmas, kurās ir savākto nozīmīgo datu centrālā glabātuve. Datu noliktava parasti tiek izveidota uzņēmuma serverī. Lai nodrošinātu datu analītisku apstrādi un saņemtu atbildes uz lietotāju vaicājumiem, dati par dažādu tiešsaistes transakciju apstrādi, kā arī dati no citiem avotiem tiek selektīvi atlasīti un sakārtoti datu noliktavas datu bāzē. Šīs idejas tālākā attīstība ir pazīstama kā datuve.

Datuve (data mart) datu glabātuve, kurā savākti operatīvie dati un dati, kas nepieciešami noteiktai lietotāju grupai. Šos datus var iegūt no uzņēmuma datu bāzes, datu noliktavas vai kāda cita specifiska avota. Datuves galvenais uzdevums ir nodrošināt, lai noteiktas lietotāju grupas šos datus saņemtu ērti lietojamā formā un varētu veikt ar tiem nepieciešamās darbības.

The concept of data warehousing has evolved out of the need for easy access to a structured store of quality data that can be used for decision making.

2

Datu noliktavas realizēšanas slāņi

1. Kopējās datu kopas veidošana (staging):1) datu "attīrīšana" (cleaned);2) datu transformācijas;3) datu grupēšana (catalogued);4) datu sagatavošana apstrādes veikšanai.

2. Datu integrēšana.

3. Datu lietošanas nodrošināšana (access).

3

Nozīmīgākās izstrādes, kuras ietekmēja datu noliktavu tehnoloģijas veidošanos

1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.

1970s — A. C. Nielsen and IRI provide dimensional data marts for retail sales.

1983 — Teradata introduces a database management system specifically designed for decision support.

1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".

1990 — Red Brick Systems introduces Red Brick Warehouse, a database management system specifically for data warehousing.

1991 — Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.

1991 — Bill Inmon publishes the book Building the Data Warehouse.

1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.

1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.

2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses.

4

William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is – a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

Bill Inmon has published more than 40 books and 1,000 articles on data warehousing and data management. A selection:1981. Effective Data Base Design. Prentice Hall.1986. Information systems architecture : a system developer's primer. Prentice-Hall.1986. The dynamics of data base. With Thomas J. Bird, Jr. Prentice-Hall.1988. Information engineering for the practitioner : putting theory into practice. Prentice Hall.1992. Rdb/VMS: Developing the Data Warehouse. With Chuck Kelley, QED. 1992. Building the Data Warehouse. 1st Edition. Wiley and Sons1998. Corporate Information Factory. With Claudia Imhoff and Ryan Sousa. John Wiley and Sons.2000. Exploration Warehousing: Turning Business Information into Business Opportunity. With R. H. Terdeman, John Wiley and Sons2007. Business Metadata. With Bonnie Oneil and Lowell Fryman. Elsevier Press 2007. Tapping Into Unstructured Data. With Tony Nesavich. Prentice Hall2008. DW 2.0 - Architecture for the Next Generation of Data Warehousing. With Derek Strauss and Genia Neushloss, Elsevier Press

5

A data warehouse is a copy of transactional data specifically structured for querying and analysis.

Business Intelligence refers to reporting and analysis of data stored in the warehouse.Data warehouse is the foundation for business intelligence.Data warehouse/business intelligence (DW/BI) refers to the complete end-to-end system.

6

Datu noliktavu sistēmu tirgus

It is estimate that the data-warehousing market will see a compound annual growth rate of 11.5% from 2009 through 2013 to reach a total of $13.2bn in revenues.

In 2011. year database market growth 6.5 % and total revenue $33.9 billion.

Four vendors dominate the data-warehouse market, with 93.6% of total revenue in 2010. These vendors are expected to retain their advantage and generate 92.2% of revenue in 2013. Main vendors:1. Oracle 2. IBM3. Microsoft 4. Teradata5. EMC/Greenplum 6. SAP/Sybase

7

Datu noliktavu lietojumi

1. Decision support

2. Trend analysis

3. Financial forecasting

4. Logistics and inventory management

5. Agriculture data analysis

6. Biological data analysis

7. Accounting intelligence. A specialist form of business intelligence, accounting intelligence is the general name for the set of technologies used to extract, analyze and present information from accounting and ERP applications such as JD Edwards, Oracle E-Business Suite or SAP.8. Business intelligence (BI) is the ability for an organization to collect, maintain and organize knowledge. This produces large amounts of information that can help develop new opportunities. Identifying these opportunities, and implementing an effective strategy, can provide a competitive market advantage and long-term stability.9. Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and that analyze current and historical facts to make predictions about future events.10. Business analytics (BA) refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.

8

9

10

Business intelligence1

Business ingelligence (BI) - technology infrastructure for gaining maximum information from available data for the purpose of improving business processes. Typical BI infrastructure components are as follows: software solution for gathering, cleansing, integrating, analyzing and sharing data.

The most common kinds of Business Intelligence systems are:

EIS - Executive Information Systems DSS - Decision Support Systems MIS - Management Information Systems GIS - Geographic Information Systems OLAP - Online Analytical Processing and multidimensional analysis CRM - Customer Relationship Management Business Intelligence systems based on Data Warehouse technology. A Data Warehouse (DW) gathers information from a wide range of company's operational systems, Business Intelligence systems based on it.

1 http://datawarehouse4u.info/News.html

11

User information needs

12

Datawarehouse, OLAP and business intelligence

Online analytical processing (OLAP) is an approach to swiftly answer multi-dimensional analytical (MDA) queries.

OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining.Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).

An expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.

13

Datawarehouse

OLAP

Business intelligence?

Multi-dimensional database

14

OLTP and Data warehouse features

Data warehouse Operational systemSubject oriented Transaction orientedLarge (hundreds of GB up to several TB)

Small (MB up to several GB)

Historic data Current dataDe-normalized table structure (few tables, many columns per table)

Normalized table structure (many tables, few columns per table)

Batch updates Continuous updatesUsually very complex queries Simple to complex queries

15

Datu noliktavas pamatarhitektūra

16

17

Datu noliktava ar kopējās datu kopas veidošanas rīkiem

18

Datu noliktava ar datu vitrīnām

Datu vitrīnas (data marts)

19

A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.

Integral architecture of a data warehouse

20

Online analytical processing (OLAP)

21

In computing, online analytical processing (OLAP) is an approach to swiftly answer multi-dimensional analytical queries.

The term OLAP was created as a slight modification of the traditional database term OLTP - Online Transaction Processing.

Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases.

The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the rows and columns of the matrix; the measures form the values.

22

OLAP system

23

Daudzdimensiju datu struktūru veidošana

24

25

26

4 un 5 dimensiju kubi

27

OLAP sistēmas Multidimensional OLAP (MOLAP)MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.

Relational OLAP (ROLAP)ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Hybrid OLAP (HOLAP)Database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

28

Citas OLAP sistēmas

WOLAP - Web-based OLAP

DOLAP - Desktop OLAP

RTOLAP - Real-Time OLAP

29

Dimensijas, to hierarhijas, fakti un agregāti

Measures

The values within the cube cells represent the two measures, Packages and Last. The Packages measure represents the number of imported packages, and the Sum function is used to aggregate the facts. The Last measure represents the date of receipt, and the Max function is used to aggregate the facts.

Dimensions

The Route dimension represents the means by which the imports reach their destination. Members of this dimension include ground, nonground, air, sea, road, or rail. The Source dimension represents the locations where the imports are produced, such as Africa or Asia. The Time dimension represents the quarters and halves of a single year.

Aggregates

Business users of a cube can determine the value of any measure for each member of every dimension, regardless of the level of the member within the dimension, because Analysis Services aggregates values at upper levels as needed. For example, the measure values in the preceding illustration can be aggregated according to a standard calendar hierarchy by using the Calendar Time hierachy in the Time dimension as illustrated in the following diagram.

30

Dimensiju hierarhijas

31

32

33

Datu kubi un to komponentes

34

Zvaigznes shēma

The star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is considered an important special case of the snowflake schema, and is more effective for handling simpler queries.

Zvaigznes shēmas piemērs

35

Sniegpārsliņas shēma

36

37

Sniegpārsliņas shēmas piemērs

38

Divu sniegpārsliņas shēmu savienojums

39

Datu noliktavas veidošanas rīki

40

Operatīvo datu relāciju

datu bāze

Datu attīrīšana,

apkopošana, agregēšana

Datu noliktavas

relāciju datu bāze

Datu nolikava

(MOLAP)

Datu izgūšana un

analīze

Metadatu repozitārijs

Rīks datu modeļa

veidošanai

Rīks datu nodošanai

(ETL)

Rīks datu modeļa

veidošanai

Rīks datu modeļa

veidošanai

Rīks lietojumu veidošanai

ETL processETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:

- extracting the data from source systems (SAP, ERP, other oprational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.

- transforming the data may involve the following tasks: 1) applying business rules (so-called derivations, e.g., calculating new measures and dimensions), 2) cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.), 3) filtering (e.g., selecting only certain columns to load), 4) splitting a column into multiple columns and vice versa, 5) joining together data from multiple sources (e.g., lookup, merge), 6) transposing rows and columns, 7) applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing). - loading the data into a data warehouse or data repository other reporting applications.

41

ETL tools

List of the most popular ETL tools:

Informatica - Power Center IBM - Websphere DataStage(Formerly known as Ascential DataStage) SAP - BusinessObjects Data Integrator IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft - SQL Server Integration Services Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor) SAS - Data Integration Studio Oracle - Warehouse Builder AB Initio Information Builders - Data Migrator Pentaho - Pentaho Data Integration Embarcadero Technologies - DT/Studio IKAN - ETL4ALL IBM - DB2 Warehouse Edition Pervasive - Data Integrator ETL Solutions Ltd. - Transformation Manager Group 1 Software (Sagent) - DataFlow Sybase - Data Integrated Suite ETL Talend - Talend Open Studio Expressor Software - Expressor Semantic Data Integration System Elixir - Elixir Repertoire OpenSys - CloverETL

42

MS SQL Server Data Transformation Service

43

MS SQL Server Data Transformation Service

44

MS SQL Server Data Transformation Service

45

Comparison of OLAP Servers

DBVS Firma MOLAP ROLAP HOLAP

Essbase Oracle X X

icCube Crazy Development X

MS Analysis Services MS X X X

Micro Strategy OLAP

Services

MicroStrategy X X X

Mondrian OLAP Server Pentaho X

Oracle OLAP Option Oracle X X X

Palo Jedox X

SAS OLAP Server SAS InstituteJedox X X X

TM1 IBM X

46

Datu izgūšana – šķērstabulas (pivot tables)

47

48

Business Intelligence tools

Oracle - Siebel Business Analytics Applications

SAS - Business Intelligence

SAP - BusinessObjects XI

IBM - Cognos 8 BI

Oracle - Hyperion System 9 BI+

Microsoft - Analysis Services

MicroStrategy - Dynamic Enterprise Dashboards

Pentaho - Open BI Suite

Information Builders - WebFOCUS Business Intelligence

QlikTech - QlikView

TIBCO Spotfire - Enterprise Analytics

Sybase - InfoMaker

KXEN - IOLAP

SPSS - ShowCase

49

Sybase datawarehouse Technologies

50

SAS company

51

OLAP in SQL Server 2005

52

53

Oracle datu noliktavas kopējā arhitektūra

54

55

Materializētie skati

Datums Pircējs Produkts12.01.02 A 112.01.02 B 212.01.02 C 312.01.02 C 414.01.02 A 314.01.02 D 314.01.02 D 314.01.02 D 214.01.02 A 114.01.02 A 414.01.02 A 314.01.02 D 201.02.02 D 201.02.02 C 301.02.02 C 101.02.02 B 201.02.02 C 402.02.02 C 302.02.02 B 3

56

Datums Produkts Pārdots01.02 1 201.02 2 301.02 3 501.02 4 202.02 1 102.02 2 202.02 3 302.02 4 1

Query Rewrite

57

Query Rewrite Subgraphs

58

Data extraction with SQL and MDX (multidimension extraction language)

Pivot Table Service (PTS)

59

Datu noliktavas datu struktūras projektēšana

60

Datu noliktavas ER diagramma

61

Datu noliktavas permanentās struktūras

62

OLTP datu bāze

63

Permanentās un virtuālās datu noliktavas datu struktūras

64

Virtual datawarehouse

65

66