· Web view2011/11/01 · in IBM Systems Journal where they introduce the term...
Transcript of · Web view2011/11/01 · in IBM Systems Journal where they introduce the term...
Datu bāzes datu izmantošana
1. Vaicājumi un atskaites
2. No kopsavilkuma uz detalizētu informāciju (drill-down) un no detalizētas informācijas uz kopsavilkuma informāciju (roll-up analysis) – datu noliktavas tehnoloģijas
3. "Intelektuālu" datu apstrādes algoritmu izmantošana (datizrace (data mining)) lēmumu pieņemšanai
1
Datu noliktava (data warehouse) un datu vitrīna (data mart)
A data warehouse (DW) is a database used for reporting.
A data warehouse is a database specifically structured for query and analysis. A data warehouse typically contains data representing the business history of an organization.
Datu noliktavas ir dažādu uzņēmumu uzņēmējdarbības sistēmas, kurās ir savākto nozīmīgo datu centrālā glabātuve. Datu noliktava parasti tiek izveidota uzņēmuma serverī. Lai nodrošinātu datu analītisku apstrādi un saņemtu atbildes uz lietotāju vaicājumiem, dati par dažādu tiešsaistes transakciju apstrādi, kā arī dati no citiem avotiem tiek selektīvi atlasīti un sakārtoti datu noliktavas datu bāzē. Šīs idejas tālākā attīstība ir pazīstama kā datuve.
Datuve (data mart) datu glabātuve, kurā savākti operatīvie dati un dati, kas nepieciešami noteiktai lietotāju grupai. Šos datus var iegūt no uzņēmuma datu bāzes, datu noliktavas vai kāda cita specifiska avota. Datuves galvenais uzdevums ir nodrošināt, lai noteiktas lietotāju grupas šos datus saņemtu ērti lietojamā formā un varētu veikt ar tiem nepieciešamās darbības.
The concept of data warehousing has evolved out of the need for easy access to a structured store of quality data that can be used for decision making.
2
Datu noliktavas realizēšanas slāņi
1. Kopējās datu kopas veidošana (staging):1) datu "attīrīšana" (cleaned);2) datu transformācijas;3) datu grupēšana (catalogued);4) datu sagatavošana apstrādes veikšanai.
2. Datu integrēšana.
3. Datu lietošanas nodrošināšana (access).
3
Nozīmīgākās izstrādes, kuras ietekmēja datu noliktavu tehnoloģijas veidošanos
1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
1970s — A. C. Nielsen and IRI provide dimensional data marts for retail sales.
1983 — Teradata introduces a database management system specifically designed for decision support.
1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".
1990 — Red Brick Systems introduces Red Brick Warehouse, a database management system specifically for data warehousing.
1991 — Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.
1991 — Bill Inmon publishes the book Building the Data Warehouse.
1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.
2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses.
4
William H. Inmon (born 1945) is an American computer scientist, recognized by many as the father of the data warehouse. Bill Inmon wrote the first book, held the first conference (with Arnie Barnett), wrote the first column in a magazine and was the first to offer classes in data warehousing. Bill Inmon created the accepted definition of what a data warehouse is – a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.
He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.
Bill Inmon has published more than 40 books and 1,000 articles on data warehousing and data management. A selection:1981. Effective Data Base Design. Prentice Hall.1986. Information systems architecture : a system developer's primer. Prentice-Hall.1986. The dynamics of data base. With Thomas J. Bird, Jr. Prentice-Hall.1988. Information engineering for the practitioner : putting theory into practice. Prentice Hall.1992. Rdb/VMS: Developing the Data Warehouse. With Chuck Kelley, QED. 1992. Building the Data Warehouse. 1st Edition. Wiley and Sons1998. Corporate Information Factory. With Claudia Imhoff and Ryan Sousa. John Wiley and Sons.2000. Exploration Warehousing: Turning Business Information into Business Opportunity. With R. H. Terdeman, John Wiley and Sons2007. Business Metadata. With Bonnie Oneil and Lowell Fryman. Elsevier Press 2007. Tapping Into Unstructured Data. With Tony Nesavich. Prentice Hall2008. DW 2.0 - Architecture for the Next Generation of Data Warehousing. With Derek Strauss and Genia Neushloss, Elsevier Press
5
A data warehouse is a copy of transactional data specifically structured for querying and analysis.
Business Intelligence refers to reporting and analysis of data stored in the warehouse.Data warehouse is the foundation for business intelligence.Data warehouse/business intelligence (DW/BI) refers to the complete end-to-end system.
6
Datu noliktavu sistēmu tirgus
It is estimate that the data-warehousing market will see a compound annual growth rate of 11.5% from 2009 through 2013 to reach a total of $13.2bn in revenues.
In 2011. year database market growth 6.5 % and total revenue $33.9 billion.
Four vendors dominate the data-warehouse market, with 93.6% of total revenue in 2010. These vendors are expected to retain their advantage and generate 92.2% of revenue in 2013. Main vendors:1. Oracle 2. IBM3. Microsoft 4. Teradata5. EMC/Greenplum 6. SAP/Sybase
7
Datu noliktavu lietojumi
1. Decision support
2. Trend analysis
3. Financial forecasting
4. Logistics and inventory management
5. Agriculture data analysis
6. Biological data analysis
7. Accounting intelligence. A specialist form of business intelligence, accounting intelligence is the general name for the set of technologies used to extract, analyze and present information from accounting and ERP applications such as JD Edwards, Oracle E-Business Suite or SAP.8. Business intelligence (BI) is the ability for an organization to collect, maintain and organize knowledge. This produces large amounts of information that can help develop new opportunities. Identifying these opportunities, and implementing an effective strategy, can provide a competitive market advantage and long-term stability.9. Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, data mining and that analyze current and historical facts to make predictions about future events.10. Business analytics (BA) refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.
8
9
10
Business intelligence1
Business ingelligence (BI) - technology infrastructure for gaining maximum information from available data for the purpose of improving business processes. Typical BI infrastructure components are as follows: software solution for gathering, cleansing, integrating, analyzing and sharing data.
The most common kinds of Business Intelligence systems are:
EIS - Executive Information Systems DSS - Decision Support Systems MIS - Management Information Systems GIS - Geographic Information Systems OLAP - Online Analytical Processing and multidimensional analysis CRM - Customer Relationship Management Business Intelligence systems based on Data Warehouse technology. A Data Warehouse (DW) gathers information from a wide range of company's operational systems, Business Intelligence systems based on it.
1 http://datawarehouse4u.info/News.html
11
User information needs
12
Datawarehouse, OLAP and business intelligence
Online analytical processing (OLAP) is an approach to swiftly answer multi-dimensional analytical (MDA) queries.
OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining.Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications coming up, such as agriculture. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).
An expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.
13
Datawarehouse
OLAP
Business intelligence?
Multi-dimensional database
14
OLTP and Data warehouse features
Data warehouse Operational systemSubject oriented Transaction orientedLarge (hundreds of GB up to several TB)
Small (MB up to several GB)
Historic data Current dataDe-normalized table structure (few tables, many columns per table)
Normalized table structure (many tables, few columns per table)
Batch updates Continuous updatesUsually very complex queries Simple to complex queries
15
Datu noliktavas pamatarhitektūra
16
17
Datu noliktava ar kopējās datu kopas veidošanas rīkiem
18
Datu noliktava ar datu vitrīnām
Datu vitrīnas (data marts)
19
A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.
Integral architecture of a data warehouse
20
Online analytical processing (OLAP)
21
In computing, online analytical processing (OLAP) is an approach to swiftly answer multi-dimensional analytical queries.
The term OLAP was created as a slight modification of the traditional database term OLTP - Online Transaction Processing.
Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases and hierarchical databases that are faster than relational databases.
The output of an OLAP query is typically displayed in a matrix (or pivot) format. The dimensions form the rows and columns of the matrix; the measures form the values.
22
OLAP system
23
Daudzdimensiju datu struktūru veidošana
24
25
26
4 un 5 dimensiju kubi
27
OLAP sistēmas Multidimensional OLAP (MOLAP)MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.
Relational OLAP (ROLAP)ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Hybrid OLAP (HOLAP)Database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.
28
Citas OLAP sistēmas
WOLAP - Web-based OLAP
DOLAP - Desktop OLAP
RTOLAP - Real-Time OLAP
29
Dimensijas, to hierarhijas, fakti un agregāti
Measures
The values within the cube cells represent the two measures, Packages and Last. The Packages measure represents the number of imported packages, and the Sum function is used to aggregate the facts. The Last measure represents the date of receipt, and the Max function is used to aggregate the facts.
Dimensions
The Route dimension represents the means by which the imports reach their destination. Members of this dimension include ground, nonground, air, sea, road, or rail. The Source dimension represents the locations where the imports are produced, such as Africa or Asia. The Time dimension represents the quarters and halves of a single year.
Aggregates
Business users of a cube can determine the value of any measure for each member of every dimension, regardless of the level of the member within the dimension, because Analysis Services aggregates values at upper levels as needed. For example, the measure values in the preceding illustration can be aggregated according to a standard calendar hierarchy by using the Calendar Time hierachy in the Time dimension as illustrated in the following diagram.
30
Dimensiju hierarhijas
31
32
33
Datu kubi un to komponentes
34
Zvaigznes shēma
The star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is considered an important special case of the snowflake schema, and is more effective for handling simpler queries.
Zvaigznes shēmas piemērs
35
Sniegpārsliņas shēma
36
37
Sniegpārsliņas shēmas piemērs
38
Divu sniegpārsliņas shēmu savienojums
39
Datu noliktavas veidošanas rīki
40
Operatīvo datu relāciju
datu bāze
Datu attīrīšana,
apkopošana, agregēšana
Datu noliktavas
relāciju datu bāze
Datu nolikava
(MOLAP)
Datu izgūšana un
analīze
Metadatu repozitārijs
Rīks datu modeļa
veidošanai
Rīks datu nodošanai
(ETL)
Rīks datu modeļa
veidošanai
Rīks datu modeļa
veidošanai
Rīks lietojumu veidošanai
ETL processETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:
- extracting the data from source systems (SAP, ERP, other oprational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.
- transforming the data may involve the following tasks: 1) applying business rules (so-called derivations, e.g., calculating new measures and dimensions), 2) cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.), 3) filtering (e.g., selecting only certain columns to load), 4) splitting a column into multiple columns and vice versa, 5) joining together data from multiple sources (e.g., lookup, merge), 6) transposing rows and columns, 7) applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing). - loading the data into a data warehouse or data repository other reporting applications.
41
ETL tools
List of the most popular ETL tools:
Informatica - Power Center IBM - Websphere DataStage(Formerly known as Ascential DataStage) SAP - BusinessObjects Data Integrator IBM - Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft - SQL Server Integration Services Oracle - Data Integrator (Formerly known as Sunopsis Data Conductor) SAS - Data Integration Studio Oracle - Warehouse Builder AB Initio Information Builders - Data Migrator Pentaho - Pentaho Data Integration Embarcadero Technologies - DT/Studio IKAN - ETL4ALL IBM - DB2 Warehouse Edition Pervasive - Data Integrator ETL Solutions Ltd. - Transformation Manager Group 1 Software (Sagent) - DataFlow Sybase - Data Integrated Suite ETL Talend - Talend Open Studio Expressor Software - Expressor Semantic Data Integration System Elixir - Elixir Repertoire OpenSys - CloverETL
42
MS SQL Server Data Transformation Service
43
MS SQL Server Data Transformation Service
44
MS SQL Server Data Transformation Service
45
Comparison of OLAP Servers
DBVS Firma MOLAP ROLAP HOLAP
Essbase Oracle X X
icCube Crazy Development X
MS Analysis Services MS X X X
Micro Strategy OLAP
Services
MicroStrategy X X X
Mondrian OLAP Server Pentaho X
Oracle OLAP Option Oracle X X X
Palo Jedox X
SAS OLAP Server SAS InstituteJedox X X X
TM1 IBM X
46
Datu izgūšana – šķērstabulas (pivot tables)
47
48
Business Intelligence tools
Oracle - Siebel Business Analytics Applications
SAS - Business Intelligence
SAP - BusinessObjects XI
IBM - Cognos 8 BI
Oracle - Hyperion System 9 BI+
Microsoft - Analysis Services
MicroStrategy - Dynamic Enterprise Dashboards
Pentaho - Open BI Suite
Information Builders - WebFOCUS Business Intelligence
QlikTech - QlikView
TIBCO Spotfire - Enterprise Analytics
Sybase - InfoMaker
KXEN - IOLAP
SPSS - ShowCase
49
Sybase datawarehouse Technologies
50
SAS company
51
OLAP in SQL Server 2005
52
53
Oracle datu noliktavas kopējā arhitektūra
54
55
Materializētie skati
Datums Pircējs Produkts12.01.02 A 112.01.02 B 212.01.02 C 312.01.02 C 414.01.02 A 314.01.02 D 314.01.02 D 314.01.02 D 214.01.02 A 114.01.02 A 414.01.02 A 314.01.02 D 201.02.02 D 201.02.02 C 301.02.02 C 101.02.02 B 201.02.02 C 402.02.02 C 302.02.02 B 3
56
Datums Produkts Pārdots01.02 1 201.02 2 301.02 3 501.02 4 202.02 1 102.02 2 202.02 3 302.02 4 1
Query Rewrite
57
Query Rewrite Subgraphs
58
Data extraction with SQL and MDX (multidimension extraction language)
Pivot Table Service (PTS)
59
Datu noliktavas datu struktūras projektēšana
60
Datu noliktavas ER diagramma
61
Datu noliktavas permanentās struktūras
62
OLTP datu bāze
63
Permanentās un virtuālās datu noliktavas datu struktūras
64
Virtual datawarehouse
65
66