Session 8-9 Data Resource Management

99
HUANG Lihua, Fudan Univ ersity Session 8-9 Data Resource Management PART Introduction to the Foundation of Information Technologies

description

Session 8-9 Data Resource Management. PART Ⅱ Introduction to the Foundation of Information Technologies. Content. Database Concepts & Technology Experiment: ACCESS Database Trends Data Trends of Application Data Warehouse OLAP DATA Mining Creating Database Environment. DATA. - PowerPoint PPT Presentation

Transcript of Session 8-9 Data Resource Management

HUANG Lihua, Fudan University

Session 8-9 Data Resource Management

PART Ⅱ Introduction to the Foundation of Information

Technologies

HUANG Lihua, Fudan University

Content

• Database Concepts & Technology– Experiment: ACCESS

• Database Trends• Data Trends of Application

– Data Warehouse– OLAP– DATA Mining

• Creating Database Environment

HUANG Lihua, Fudan University

DATA

• Streams of raw facts representing events such as business transactions, simple observations of the state of the world

HUANG Lihua, Fudan University

FILE ORGANIZATION

• A computer system organizes data in a hierarchy that begins with bits, and proceeds to bytes, fields, records, files, and database.

File

Record Record

Field FiledField Field

Byte Byte Byte Byte

Bit BitBit Bit

HUANG Lihua, Fudan University

FILE ORGANIZATION

• BIT: Binary Digit (0,1;Y,N;On, Off)

• BYTE: Combination of BITS which

represent a CHARACTER

• FIELD: A logical grouping of characters into a word, a group of words, or a complete number.

• RECORD: Collection of FIELDS which reflect a TRANSACTION

• FILE: A Collection of Similar RECORDS

• DATABASE: An Organization’s Electronic Library of FILES

HUANG Lihua, Fudan University

FILE ORGANIZATION For Example,• Filed: Student’s name;

• Record:Number Name Course Date Grade

9525012 Zhang Yan

MIS 1998.9 A

A record

•File:Number Name Course Date Grade

9525012 Zhang Yan

MIS 1998.9 A

9525018 Jeff Yu MIS 1998.9 A

9525027 …

He Hui …

MIS …

1998.9 …

B …

HUANG Lihua, Fudan University

FILE ORGANIZATION : Another way of thinking about database components——

• ENTITY: Person, Place, Thing, Event about Which Data Must be Kept ( a record describes an entity )

• ATTRIBUTE: Description of a Particular ENTITY ( corresponds to fields)

• KEY FIELD: Field Used to Retrieve, Update, Sort RECORD

*

HUANG Lihua, Fudan University

FILE ORGANIZATION

FDU NO. HKU NO. NAME SEX TEL(O) TEL(H)

98HM001 93835 Xie Mingqiang M 58702331

98HM002 93840 Yu Bing F 65110968

98hm003 93841 Wang Pei F 58711001-23306

63568504

98HM004 93842 Ge Ruijin M 56938860 56873143

98HM005 93843 Wang Xintao M 58611828 65352394

98HM006 93844 Fu Qiang F 58666060-6007

58836304

Attribute

Record

Key Field Key Field

File

HUANG Lihua, Fudan University

KEY FIELD

• Field in Each Record

• Uniquely Identifies THIS Record

• For RETRIEVAL

UPDATING

SORTING

*

HUANG Lihua, Fudan University

Accessing Records from Computer Files:Sequential vs. Direct or random

file organization • SEQUENTIAL: Data records must be

retrieved in the same physical sequence in which they are stored. (Magnetic tape )

• DIRECT: Data can be accessed without regard to physical sequence. (Disk)

*

Direct fileorganization

Sequential fileorganization

HUANG Lihua, Fudan University

Traditional File Processing & File Organization

registration Class programs Class file

accounting Accounts programs Class file

athletics Class fileSports programs

HUANG Lihua, Fudan University

Traditional File Processing & File Organization

HUANG Lihua, Fudan University

Problems Arising from the File Organization

• Data Redundancy: The same piece of information could be duplicated in several files. • Data Inconsistency:• Data Isolation: Data files are likely to be organized differently, stored in different formats,

and often physically inaccessible to other applications. • data integrity problem: It is difficult to place data integrity constraints across multiple data

files.• Application and Data Independence: In the file environment, the applications and their

associated data files are dependent on each other.• Poor security: is difficult to enforce in the file environment. • Lack of data sharing & availability

Flat Flat FileFile

HUANG Lihua, Fudan University

Problems Arising from the File Organization

• Data Redundancy;• Data Inconsistency;• Data Isolation, data integrity problem;• Application and Data Independence;• Security, data sharing problem.

These problems led to the development of DATABASE

HUANG Lihua, Fudan University

DATABASE• A Database is an organized logical

grouping of related files.

• In a Database, data are stored & managed in a convenient form, and integrated and related so that one set of software programs provides access to all the data.

HUANG Lihua, Fudan University

DATABASE

• Collection of centralized data

• Controls redundant data

• Data stored so as to appear to users in one location

• Services multiple application

HUANG Lihua, Fudan University

DATABASE MANAGEMENT SYSTEM ( 数据库管理系统 DBMS)

• Software to create & maintain DATA enables business applications to extract data independent of specific computer programs.

HUANG Lihua, Fudan University

registrar

accounting

athletics

Class programs Class file

Accounts programs

Class file

Class file

Sports programs

Computer based files of this type cause problems such as redundancy, inconsistency, and data isolation.

registrar

accounting

athletics

Class programs

Accounts programs

Sports programs

DBMSDatabaseClass file

Accounts fileSports file

DBMS provides access to all data in the database

HUANG Lihua, Fudan University

Database Environment

HUANG Lihua, Fudan University

COMPONENTS OF DBMS:

• DATA DEFINITION LANGUAGE:– Defines Data Elements in Database

• DATA MANIPULATION LANGUAGE:– Manipulates Data for Applications– For Example: For extracting data from database, e.g.

SQL• DATA DICTIONARY:

– Formal Definitions of all Variables in Database; Controls Variety of Database Contents

*

HUANG Lihua, Fudan University

Sample data dictionary report

HUANG Lihua, Fudan University

Fundamental Database Structures

HUANG Lihua, Fudan University

HIERARCHICAL DATABASE

ROOT

FIRST CHILD

2nd CHILD

RatingsRatings SalarySalary

CompensationCompensation JobJobAssignmentsAssignments

PensionPension InsuranceInsurance HealthHealth

BenefitsBenefits

EmployerEmployer

HUANG Lihua, Fudan University

Type of RELATIONS

ONE-TO-ONE: STUDENT ID

ONE-TO-MANY:CLASS

STUDENTA

STUDENTB

STUDENTC

MANY-TO-MANY:

STUDENTA

STUDENTB

STUDENTC

COURSE1

COURSE2

HUANG Lihua, Fudan University

NETWORK DATA MODEL

• Variation of Hierarchical Model

• Useful for many-to-many relationships

STUDENTA

STUDENTB

STUDENTC

COURSE1

COURSE2

HUANG Lihua, Fudan University

Disadvantages of Hierarchical and Network DBMS

• Outdated

• Less flexible compared to RDBMS

• Lack support for ad-hoc and English language-like queries

HUANG Lihua, Fudan University

RELATIONAL DATA MODEL

• DATA IN TABLE FORMAT

• RELATION: TABLE

• Tuple( 元组) : ROW (record 记录 ) IN TABLE

• Field: COLUMN (attribute 属性 ) IN TABLE

*HOURS RATE TOTALABLE 40.50$ 10.35$ 419.18$

BAXTER 38.00$ 8.75$ 332.50$ CHEN 42.70$ 9.25$ 394.98$

DENVER 35.90$ 9.50$ 341.05$

HUANG Lihua, Fudan University

Example DB: Fortune 500 Companies

• company

• industry codes

allied

boeing

...

Comp. name sales assets netincome empls indcode yr

9115000

9035000

13271000

7593000

-279000

292000

143800

95700

37

37

85

82

42

44

...

indcode indname

pharmaceuticals

computers

HUANG Lihua, Fudan University

The Relational Data Model

HUANG Lihua, Fudan University

Current DBMS: Relational Database

• DBMS Vendor– MS: Access, SQL Server– Oracle– Sybase– DB2– Informix– MySQL

HUANG Lihua, Fudan University

The Relational Database Model The relational model is based on a simple concept

of tables in order to capitalize on characteristics of rows and columns of data, which is consistent with real-world business situations.

One of the greatest advantages of the relational model is its conceptual simplicity and the ability to link records in a way that is not predefined.

HUANG Lihua, Fudan University

The Relational Abstraction

• Information is in tables– Also called (base) relations

• Columns define attributes( 属性、字段、数据项)

– Also called fields or domains

• Rows define records– Also called tuples (元组)

• Cells contain values– All cells in column have information of same type

• e.g., integer, floating point, text, date

HUANG Lihua, Fudan University

Operations on Tables

• Add new rows (or sometimes columns)– Modify existing rows

• Choose a subset of columns• Choose a subset of rows• Combine rows (e.g., sum values in a column)• Combine columns• Combine two tables (join)• No operations to combine individual cells

– Unlike spreadsheet

HUANG Lihua, Fudan University

Three Basic Operations in a Relational Database

• Select: – Creates subset of rows that meet specific criteria

• Join: – Combines relational tables to provide users with

information

• Project: – Enables users to create new tables containing only

relevant information

HUANG Lihua, Fudan University

The three basic operations of a relational DBMS

HUANG Lihua, Fudan University

Operating on Databases: SQL

• Every abstraction needs an interface through which users invoke abstract operations

– graphical interface– language

• Structured Query Language– Select …(content)…. From… (table)..Where (condition)

• We'll focus only on queries– Query = question– Extract some data from one or more tables to answer a

particular question

HUANG Lihua, Fudan University

Physical vs. Logical Data View• Minimizes these problems by providing two “view

s” of the database data: – The physical view deals with the actual, physical arrang

ement and location of data in the direct access storage devices (DASD).

– The logical view, or user’s view, represents data in a format that is meaningful to a user and to the software programs that process that data.

• Entity-relationship diagram (ER 图) : Methodology for documenting databases illustrating relationships between database entities

• Normalization (范式) : Process of creating small stable data structures from complex groups of data

HUANG Lihua, Fudan University

Entity-relationship diagram

HUANG Lihua, Fudan University

Experiment: Microsoft Access

• Features:– Create/Modify databases

– Specify/Run queries

– Design/Print reports

– Design graphical user interfaces around databases• Forms for entering, viewing data

• Assignment: P136 APP. Exer 3

P.169 App. Exer 1

HUANG Lihua, Fudan University

Content

• Database Concepts & Technology– Experiment: ACCESS

• Database Trends• Data Trends of Application

– Data Warehouse– OLAP– DATA Mining

• Creating Database Environment

HUANG Lihua, Fudan University

2. Database Trends(1)

• The evolution of Database System

• Data– Simple data => Multimedia data, Knowledge

• Model – Relational model => OO model

Object relational model

HUANG Lihua, Fudan University

Databases Trends (2)

• Application– OLTP => OLAP

• Data organization– Database => Data warehouse, Data Marts

• Query language– SQL => Deductive

HUANG Lihua, Fudan University

Emerging Database Models

The most common database models are: Multimedia database Deductive databases Object-oriented databases Multimedia and hypermedia databases

Multidimensional Database

HUANG Lihua, Fudan University

Object-Oriented Database Model

• Object-oriented (OO) databases store both data and procedures acting on the data, as objects.

• Encapsulation Capability– The OO database can be particularly helpful in multime

dia environments, such as in manufacturing sites using CAD/CAM.

– OO databases can be particularly useful in supporting temporal and spatial( 时空) dimensions.

• Terminology in the OO model includes:– objects, attributes, classes, methods, and messages.

HUANG Lihua, Fudan University

Hypermedia Database Model

The hypermedia database model stores chunks of information in the form of nodes connected by links established by the user.

The nodes can contain text, graphics, sound, full-motion video, or executable computer programs.

Users can branch to related information in any kind of relationship.

HUANG Lihua, Fudan University

A hypermedia database

HUANG Lihua, Fudan University

Multidimensional Database

• A variation of the relational model• Use multidimensional structures to organize data and e

xpress the relationship between data.• A dimension of the data : a side of a cube.

• ① 多维数组– ( 北京, 1999 年,彩电, 10000)– ( 地理位置,年份,产品类型,销售额 )

• ② 维的层次– 例如:年、季度、月份、日期– 国家、地区、省、城市

• ③ 维内元素的类– 例如: 按产品的价格分成高、中、低档。– 按原材料的成本价格分类

HUANG Lihua, Fudan University

Multidimensional data model

HUANG Lihua, Fudan University

Specialized Databases

• There are many specialized databases, depending on the type or format of data stored. – A geographical information database contains locationa

l data for overlaying on maps or images.

– A knowledge database stores decision rules used to evaluate situations and help users make decisions like an expert.

– A multimedia database stores data on many media—sounds, video, images, graphic animation, and text.

HUANG Lihua, Fudan University

Content

• Database Concepts & Technology– Experiment: ACCESS

• Database Trends• Data Trends of Application

– Data Warehouse– OLAP– DATA Mining

• Creating Database Environment

HUANG Lihua, Fudan University

3. Data Trends of Application

• Data Warehouse

• OLAP

• DATA Mining

HUANG Lihua, Fudan University

From Database to Data Warehousin

g• 随着信息技术在企业的广泛应用,企业积累了

大量数据• 企业所面对的问题不是简单地处理数据,而是

如何使用数据 ---- 从操作处理 (Operational Processing) 到

分析处理 (Analytical Processing)

HUANG Lihua, Fudan University

Operational vs. Decision Support Systems

• Operational Processing in Operational Systems– Support day to day transactions– Contain current, “up to date” data– Examples: customer orders, inventory levels, bank

account balances• Analytical Processing in Decision Support

Systems– Support strategic decision making– Contain historical, “summarized” data– Examples: performance summary, customer

profitability, market segmentation

HUANG Lihua, Fudan University

Why data warehouses?

• Decision Support Data– Are found in many different databases

• within the company• outside the company

– In practical terms, locating and integrating all this information in real time is very difficult

• Solution:– Create separate repositories of data for decision

support – => data warehouses

HUANG Lihua, Fudan University

Data Warehousing

• Stores current and historical data – Consolidates data for management analysis and d

ecision making– Supports reporting and query tools

• “ 数据仓库之父” W. H. Inmon 给的定义:数据仓库就是一个用以更好地支持企业或组织的决策分析处理的,面向主题的,集成的,不可更新的,随时间不断变化的数据集合。

HUANG Lihua, Fudan University

Characteristics of Data Warehouses1) Organization. Data are organized by detailed subjects.

2) Consistency. Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner.

3) Time variant. The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time.

4) Non-volatile. Once entered into the warehouse, data are not updated.

5) Relational. The data warehouse uses a relational structure.

6) Client/server. The data warehouse uses the client/server to provide the end user an easy access to its data.

HUANG Lihua, Fudan University

Data Warehouse Suitability Data warehousing is most appropriate for

organizations in which some of the following apply.

Large amounts of data need to be accessed by end-users. The operational data are stored in different systems. An information-based approach to management is in use. There is a large, diverse customer base. The same data are represented differently in different

systems. Data are stored in highly technical formats that are difficult to

decipher. Extensive end-user computing is performed.

HUANG Lihua, Fudan University

Comparison with DB Systems

• DB Systems– 数据库:操作型数据 (O

perational Data) ,增、删、改操作频繁

– 数据库核心:功能强大,面向 OLTP (Online Transa

ction Processing) 应用– 数据库工具:以查询工

具为主

• DWS Systems– 数据仓库:分析型数

据 (Analytical Data) ,极少有更新操作

– 数据仓库管理系统:因极少有更新操作,故功能简单

– 数据仓库工具:以分析工具为主

HUANG Lihua, Fudan University

OLTP在线业务处理

Database

OLAP在线分析处理

DSS决策支持系统

Data Mining数据挖掘

DataWarehouse

HUANG Lihua, Fudan University

数据仓库的层次结构

档案数据

原始数据

综合数据

高度综合数据

销售数据

汇总的销售数据

分产线的周销售数据

分产线的月销售数据

HUANG Lihua, Fudan University

数据仓库的实现

取出数据

取出数据

数据转换

数据转换

主题数据

主题 1

主题数据

主题 2

主题数据

主题 3

Infomart

Datamart

Infomart( 信息超市) :

Datamart (数据超市) :

这是一个应用功能(或从应用功能产生的输出)可多次运行数据仓库,它是对应业务问题的答案。

这是数据仓库的一个数据子集,它对应最终用户的信息需求。它比之数据仓库中的数据要更加归纳、汇总一些。

Infomart

Datamart

Data WarehouseOperational Databases

Internal data source

External data source

HUANG Lihua, Fudan University

Data MartsData Marts are an alternative used by many other firms is crea

tion of a lower cost, scaled-down version of a data warehouse. They refer to small warehouses that focus on specific aspects of a company, such as for a strategic business unit (SBU) or a department.

Two major types of Data Marts:1) Replicated (dependent) Data Marts. In such cases one can replicate functional subsets of the data warehouse in smaller databases.2) Stand-Alone Data Marts. A company can have one or more independent data marts without having a data warehouse.

HUANG Lihua, Fudan University

数据仓库的展现

取出数据

取出数据

数据转换

数据转换

主题数据

主题 1

主题数据

主题 2

主题数据

主题 3

Infomart

Datamart

Metadata: 关于数据的数据

查询报表

查询报表

在线分析

在线分析

Operational Databases

HUANG Lihua, Fudan University

数据仓库的数据、信息流

主题表

汇总表

信息

信息

信息

数据确认

数据转换

数据整合

数据汇总

数 据 市

数据取用处理工具在线分析数据挖掘

再分析

新展示

新数据集

数据仓库主题 1

OLTP数据库

运行数据定义

Operational Databases

HUANG Lihua, Fudan University

提取、转换、加载 (ETL)(Extract, Transform, Load)

• 数据清理过程• 解决冲突 • 使用效率的考

• Extract– Consolidate data from severa

l sources

• Transform– Filter out unwanted data, corr

ect incorrect data, convert to new data elements, aggregate into new data subsets

• Load– Load into data warehourse

HUANG Lihua, Fudan University

Example of DWS Application1 、贸易中心需求

贸易中心市内卷烟销售采用三级批的方式:

卷烟收购 贸易中心 25 个有限公司 零售商 价格 收购价 调拨价 批发价 集市价

网络零售价

消费者

销售分析

1. 销售分析2. 资源投放分析3. 价格分析4. 有限公司效益预测5. 网络建设信息分析

HUANG Lihua, Fudan University

2 、技术方案: SAS+DWA+IntrNet 的解决方案

2.1 数据仓库逻辑配置方案“ 市内销售分析”主题 (Subject_insale) 中包含两个层次(1)  详细数据( DETAIL TABLE ):(2)  数据集合( DATA GROUP ):1) 根据需求分析需建立五个数据集合:2) 销售量分析集合 (01_QuantyData) ,3) 货源投放分析集合 (02_DistributeData) ,4) 价格分析集合 (03_PriceData) ,5) 效益分析集合 (04_BenefitData) ,6) 网络建设分析集合 (05_SalenetData)

HUANG Lihua, Fudan University

数据仓库逻辑结构图

DW_tobacco_SH

Tobacco_DW

Global_subject

plan

prod

sale

store

Subject_insale

01_QuantyData

(Detail Table 1)

…. (Detail Table n)

Detail Logical Table

02_DistributeData 03_PriceData

04_BenefitData

05_SalenetData

Subject_outsale

01_MarketData

(Detail Table 1)

(Detail Table n)

Detail Logical Table

02_DistributeData

03_PriceData

04_SubsaleData

ODD_information_pub

ODD_insale

ODD_outsale

HUANG Lihua, Fudan University

SAS/IntrNet 配置方案: SAS/IntrNet 应用发布示意图

通 用 标 准 B/W 平台

SAS/IntrNet SAS/DWA

Browser

webserver broker

SAS appsrver

WEBPGM( 需开发部分 ) DWA info

HUANG Lihua, Fudan University

销量分析 货源投放 价格分析 效益分析 网建客户信息分析 系统维护

有限公司 1 有限公司 2

时间

销量销量 比较对比 同比

日期 1 1999 年 04 月 01 日日期 2 2000 年 02 月 28 日

统计周期 月

销售性质 所有 牌号 301101 对照表(产地 /代码)

确定 重选

有限公司 所有

销售区域

3 、功能分析

HUANG Lihua, Fudan University

比较复选框

牌号对照表

品牌产地 品牌号 具体牌号

代码输入:

有限公司

销售性质

关注品牌

总计

沪产烟

外地烟

外烟

总计

熊猫

中华

红双喜

。。。

总计

翻盖中华

中华软壳

中华礼包

。。。

HUANG Lihua, Fudan University

货源投放

1 控制牌号投放量分析(系数相关性)

销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护

投放量分析 起始日期 1999 年 09月

截止 截止日期 : 2000 年 02月

确定 重选

销量

集市价格

库存

系数

牌号选择

相关性分析

99/09 99/10 99/11 99/12 00/01 00/02

截止

(分析类别)

HUANG Lihua, Fudan University

2投放进度分析

销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护

投放进度分析

有限公司库存

投放

集市价格

起始日期 2000年 01月 01

截止 截止日期 2000年 02月 01日

确定 重选

牌号 (代码)

统计周期 周

01/1 01/2 01/3 01/4 02/1 02/2 02/3 02/4

截止

(分析类别)

有限公司 所有

销量

对照表

HUANG Lihua, Fudan University

3.3 市内销售价格分析 1 价格分析

销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护

价格分析

集市价

集市均价 起始日期 1999 年 09 月01日

截止 截止日期 2000年 02月

确定 重选

牌号

统计周期 月

(代码)

(分析类别)

99/09 99/10 99/11 99/12 00/01 00/02

截止

网络批发价

对照表

HUANG Lihua, Fudan University

2价量分析

销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护

价量分析

销售性质

统计周期 周 市内销量

集市价格 起始日期 1999 年 09月

截止 截止日期 2000 年 02月

确定 重选

牌号

(分析类别)

99/09 99/10 99/11 99/12 00/01 00/02

截止

有限公司 所有 条件选定的价格

(代码) 对照表

HUANG Lihua, Fudan University

复旦大学数据管理与数据分析应用

主题数据分析层

共享数据层

数据展现

ROLAP

●●●

共享数据库

人事数据库

人事子系统

学工数据库

学工子系统

教务数据库

教务子系统

科研数据库

科研子系统

主题数据层

●●●

业务数据层

主题数据库

HUANG Lihua, Fudan University

OLAP, DM, KDD

• ON-LINE ANALYTICAL PROCESSING (OLAP在线分析处理 ): Tools for multi- dimensional data analysis

• DATAMINING ( DM, 数据挖掘) : Tools for finding hidden patterns, relationships, for predicting trends

• Knowledge Discovery in Databases (KDD知识发现): Tools for extracting useful knowledge from volumes of data.

HUANG Lihua, Fudan University

Multidimensionality

• Modern data and information may have several dimensions. – e.g. Management may be interested in examining sales figures in a

certain city by product, by time period, by salesperson, and by store.

• It is important to provide the user with a technology that allows him or her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation.

• The technology of slicing(切片) , dicing (切块) , Drilling (钻取) and similar manipulations is called Multidimensionality.

HUANG Lihua, Fudan University

time

Product

Area May 99 Jun 99 Jul 99 Aug 99

P1

P2

P3

P4

P5

AsiaEurope

Americas

Multidimensionality and OLAP

Sale

HUANG Lihua, Fudan University

Multidimensionality= Flexible Analysis

Time series analysis Product comparison

Area Comparsion Special Analysis

HUANG Lihua, Fudan University

Data Mining

• Exciting new set of tools for using data warehouses

• Combination of AI and statistical analysis to discover information that is “hidden” in the data

• OLAP versus Data Mining

HUANG Lihua, Fudan University

Data Mining Analysis Method

• associations – e.g. linking purchase of pizza with beer

• sequences – e.g. tying events together: marriage and purchase

of furniture• classifications

– e.g. recognizing patterns such as the attributes of customers that are most likely to quit

• Forecasting– e.g. predicting buying habits of customers based

on past patterns

HUANG Lihua, Fudan University

Applications of Data Mining

Retailing & Sales Banking

Manufacturing & Production

Brokerage & Securities trading

Computer hardware & software

Insurance Policework Government & Defense Airlines Health care Broadcasting Marketing

Data Mining is currently being used in the following areas;

HUANG Lihua, Fudan University

Knowledge Discovery in Databases (KDD)

• KDD is the process of extracting useful knowledge from volumes of data.

• It is the subject of extensive research.• KDD’s objective is to identify valid, novel, potentially

useful, and ultimately understandable patterns in data.• KDD is useful because it is supported by three

technologies that are now sufficiently mature: – Massive data collection– Powerful multiprocessor computers– Data mining algorithms

HUANG Lihua, Fudan University

Evolution of KDD

Stages in the Evolution of Knowledge Discovery Evolutionary Stage Business Question Enabling Technologies Characteristics

Data Collection (1960s)

What was my total revenue in the last five years?

Computer, tapes, disks. Retrospective, static data delivery

Data Access (1980s) What were unit sales in New England last March?

Relational databases (RDBMS), structured query language (SQL)

Retrospective, dynamic data delivery at record level

Data Warehousing & Decision Support (early 1990s)

Drill down to Boston? Online analytic processing (OLAP), multidimensional databases, data warehouses

Retrospective, dynamic data delivery at multiple levels

Intelligent Data Mining (late 1990s)

What’s likely to happen to Boston unit sales next month? Why?

Advanced algorithms, multiprocessor computers, massive databases

Prospective, proactive information delivery

Source: Courtesy of Accrue Software.

HUANG Lihua, Fudan University

Content

• Database Concepts & Technology– Experiment: ACCESS

• Database Trends• Data Trends of Application

– Data Warehouse– OLAP– DATA Mining

• Creating Database Environment

HUANG Lihua, Fudan University

Creating a Database Environment

HUANG Lihua, Fudan University

Key organizational elements in the database environment

Database Management

Systems

Data PlanningAnd ModelingMethodology

Database Technology and

management

Data Administration

Users

HUANG Lihua, Fudan University

Data Administration

• Data Administration– Develop information policy– Define information requirements– Plan for data– Oversee logical database design and database

dictionary development– Monitor use of information

• Database administrator & database analyst

HUANG Lihua, Fudan University

Key organizational elements in the database environment

Database Management

Systems

Data PlanningAnd ModelingMethodology

Database Technology and

management

Data Administration

Users

HUANG Lihua, Fudan University

Management Requirements for Database Systems

• Data Planning and Modeling Methodology– Enterprise-wide planning for data– Identify key entities, attributes, and relationships that constitute the

organization’s data

• Data Planning process– Data planning

• Develop a model of business processes

– Requirements Specification• Define information needs of end users in a business process

– Conceptual design• Expresses all information requirements in the form of high-level model (ERM)

– Logical design• Translates the conceptual model into the data model of a DBMS

– Physical Design• Determines the data storage structure and access method

HUANG Lihua, Fudan University

Procurement Process

PurchasingWhen purchasing department wrote a purchase order, it sent a copy to accounts payables.

VendorThe vendor sent an invoice to accounts payables when they delivered goods.

Receiving When material control received the goods, it sent a copy of the receiving document.

AccountPayables

Thus, the account payables received three documents from various senders : Purchasing order, invoice and receiving document. It was up to account payables to match the purchasing against the receiving document and the invoice. If they matched, the department issued payment. Otherwise, an accounts payable clerk would investigate the discrepancy, hold up payment, generate documents and all in all gum up the works.

HUANG Lihua, Fudan University

Entity-relationship diagram

HUANG Lihua, Fudan University

Main data table in Procurement Process

HUANG Lihua, Fudan University

Key organizational elements in the database environment

Database Management

Systems

Data PlanningAnd ModelingMethodology

Database Technology and

management

Data Administration

Users

HUANG Lihua, Fudan University

Management Requirements for Database Systems

• Database Technology, Management, and Users– Databases require DBMS software and staff– Database design group defines and organizes

structure and content of database– Database administration: establish physical

database, logical relations, access rules

HUANG Lihua, Fudan University

Wrap-up

• Database Management Systems – Concept, importance– Be able to use simplest ACCESS

• Major types of database– Structure model:

• Hierchical, Network, Rational, Multidimensional, OO – Data type:

• Operational database ~ analytical database ~ Deductive Database• Simple database ~ Multimedia Database ~ Hypermedia Database

– Store type• Centralized Database ~ Distributed Database

• Database development and access• Data warehouse • OLAP, Data mining

HUANG Lihua, Fudan University

Assignment for Session 8-9

• Individual Review for session 8– Reading Materials: Textbook: chapter 5– P. 168: Review Quiz: write down in your book.

• Individual Assignment :– P.136: Application exercises 3– P. 169: Application exercises 1 – submit to Vcampus in two weeks

HUANG Lihua, Fudan University

Preparation for session 10

• Individual Preparation for session 10– Reading Materials: Textbook: chapter 6– P. 206: Review Quiz: write down in your book.