Session 8-9 Data Resource Management
-
Upload
tate-dennis -
Category
Documents
-
view
67 -
download
0
description
Transcript of Session 8-9 Data Resource Management
HUANG Lihua, Fudan University
Session 8-9 Data Resource Management
PART Ⅱ Introduction to the Foundation of Information
Technologies
HUANG Lihua, Fudan University
Content
• Database Concepts & Technology– Experiment: ACCESS
• Database Trends• Data Trends of Application
– Data Warehouse– OLAP– DATA Mining
• Creating Database Environment
HUANG Lihua, Fudan University
DATA
• Streams of raw facts representing events such as business transactions, simple observations of the state of the world
HUANG Lihua, Fudan University
FILE ORGANIZATION
• A computer system organizes data in a hierarchy that begins with bits, and proceeds to bytes, fields, records, files, and database.
File
Record Record
Field FiledField Field
Byte Byte Byte Byte
Bit BitBit Bit
HUANG Lihua, Fudan University
FILE ORGANIZATION
• BIT: Binary Digit (0,1;Y,N;On, Off)
• BYTE: Combination of BITS which
represent a CHARACTER
• FIELD: A logical grouping of characters into a word, a group of words, or a complete number.
• RECORD: Collection of FIELDS which reflect a TRANSACTION
• FILE: A Collection of Similar RECORDS
• DATABASE: An Organization’s Electronic Library of FILES
HUANG Lihua, Fudan University
FILE ORGANIZATION For Example,• Filed: Student’s name;
• Record:Number Name Course Date Grade
9525012 Zhang Yan
MIS 1998.9 A
A record
•File:Number Name Course Date Grade
9525012 Zhang Yan
MIS 1998.9 A
9525018 Jeff Yu MIS 1998.9 A
9525027 …
He Hui …
MIS …
1998.9 …
B …
HUANG Lihua, Fudan University
FILE ORGANIZATION : Another way of thinking about database components——
• ENTITY: Person, Place, Thing, Event about Which Data Must be Kept ( a record describes an entity )
• ATTRIBUTE: Description of a Particular ENTITY ( corresponds to fields)
• KEY FIELD: Field Used to Retrieve, Update, Sort RECORD
*
HUANG Lihua, Fudan University
FILE ORGANIZATION
FDU NO. HKU NO. NAME SEX TEL(O) TEL(H)
98HM001 93835 Xie Mingqiang M 58702331
98HM002 93840 Yu Bing F 65110968
98hm003 93841 Wang Pei F 58711001-23306
63568504
98HM004 93842 Ge Ruijin M 56938860 56873143
98HM005 93843 Wang Xintao M 58611828 65352394
98HM006 93844 Fu Qiang F 58666060-6007
58836304
Attribute
Record
Key Field Key Field
File
HUANG Lihua, Fudan University
KEY FIELD
• Field in Each Record
• Uniquely Identifies THIS Record
• For RETRIEVAL
UPDATING
SORTING
*
HUANG Lihua, Fudan University
Accessing Records from Computer Files:Sequential vs. Direct or random
file organization • SEQUENTIAL: Data records must be
retrieved in the same physical sequence in which they are stored. (Magnetic tape )
• DIRECT: Data can be accessed without regard to physical sequence. (Disk)
*
Direct fileorganization
Sequential fileorganization
HUANG Lihua, Fudan University
Traditional File Processing & File Organization
registration Class programs Class file
accounting Accounts programs Class file
athletics Class fileSports programs
HUANG Lihua, Fudan University
Problems Arising from the File Organization
• Data Redundancy: The same piece of information could be duplicated in several files. • Data Inconsistency:• Data Isolation: Data files are likely to be organized differently, stored in different formats,
and often physically inaccessible to other applications. • data integrity problem: It is difficult to place data integrity constraints across multiple data
files.• Application and Data Independence: In the file environment, the applications and their
associated data files are dependent on each other.• Poor security: is difficult to enforce in the file environment. • Lack of data sharing & availability
Flat Flat FileFile
HUANG Lihua, Fudan University
Problems Arising from the File Organization
• Data Redundancy;• Data Inconsistency;• Data Isolation, data integrity problem;• Application and Data Independence;• Security, data sharing problem.
These problems led to the development of DATABASE
HUANG Lihua, Fudan University
DATABASE• A Database is an organized logical
grouping of related files.
• In a Database, data are stored & managed in a convenient form, and integrated and related so that one set of software programs provides access to all the data.
HUANG Lihua, Fudan University
DATABASE
• Collection of centralized data
• Controls redundant data
• Data stored so as to appear to users in one location
• Services multiple application
HUANG Lihua, Fudan University
DATABASE MANAGEMENT SYSTEM ( 数据库管理系统 DBMS)
• Software to create & maintain DATA enables business applications to extract data independent of specific computer programs.
HUANG Lihua, Fudan University
registrar
accounting
athletics
Class programs Class file
Accounts programs
Class file
Class file
Sports programs
Computer based files of this type cause problems such as redundancy, inconsistency, and data isolation.
registrar
accounting
athletics
Class programs
Accounts programs
Sports programs
DBMSDatabaseClass file
Accounts fileSports file
DBMS provides access to all data in the database
HUANG Lihua, Fudan University
COMPONENTS OF DBMS:
• DATA DEFINITION LANGUAGE:– Defines Data Elements in Database
• DATA MANIPULATION LANGUAGE:– Manipulates Data for Applications– For Example: For extracting data from database, e.g.
SQL• DATA DICTIONARY:
– Formal Definitions of all Variables in Database; Controls Variety of Database Contents
*
HUANG Lihua, Fudan University
HIERARCHICAL DATABASE
ROOT
FIRST CHILD
2nd CHILD
RatingsRatings SalarySalary
CompensationCompensation JobJobAssignmentsAssignments
PensionPension InsuranceInsurance HealthHealth
BenefitsBenefits
EmployerEmployer
HUANG Lihua, Fudan University
Type of RELATIONS
ONE-TO-ONE: STUDENT ID
ONE-TO-MANY:CLASS
STUDENTA
STUDENTB
STUDENTC
MANY-TO-MANY:
STUDENTA
STUDENTB
STUDENTC
COURSE1
COURSE2
HUANG Lihua, Fudan University
NETWORK DATA MODEL
• Variation of Hierarchical Model
• Useful for many-to-many relationships
STUDENTA
STUDENTB
STUDENTC
COURSE1
COURSE2
HUANG Lihua, Fudan University
Disadvantages of Hierarchical and Network DBMS
• Outdated
• Less flexible compared to RDBMS
• Lack support for ad-hoc and English language-like queries
HUANG Lihua, Fudan University
RELATIONAL DATA MODEL
• DATA IN TABLE FORMAT
• RELATION: TABLE
• Tuple( 元组) : ROW (record 记录 ) IN TABLE
• Field: COLUMN (attribute 属性 ) IN TABLE
*HOURS RATE TOTALABLE 40.50$ 10.35$ 419.18$
BAXTER 38.00$ 8.75$ 332.50$ CHEN 42.70$ 9.25$ 394.98$
DENVER 35.90$ 9.50$ 341.05$
HUANG Lihua, Fudan University
Example DB: Fortune 500 Companies
• company
• industry codes
allied
boeing
...
Comp. name sales assets netincome empls indcode yr
9115000
9035000
13271000
7593000
-279000
292000
143800
95700
37
37
85
82
42
44
...
indcode indname
pharmaceuticals
computers
HUANG Lihua, Fudan University
Current DBMS: Relational Database
• DBMS Vendor– MS: Access, SQL Server– Oracle– Sybase– DB2– Informix– MySQL
HUANG Lihua, Fudan University
The Relational Database Model The relational model is based on a simple concept
of tables in order to capitalize on characteristics of rows and columns of data, which is consistent with real-world business situations.
One of the greatest advantages of the relational model is its conceptual simplicity and the ability to link records in a way that is not predefined.
HUANG Lihua, Fudan University
The Relational Abstraction
• Information is in tables– Also called (base) relations
• Columns define attributes( 属性、字段、数据项)
– Also called fields or domains
• Rows define records– Also called tuples (元组)
• Cells contain values– All cells in column have information of same type
• e.g., integer, floating point, text, date
HUANG Lihua, Fudan University
Operations on Tables
• Add new rows (or sometimes columns)– Modify existing rows
• Choose a subset of columns• Choose a subset of rows• Combine rows (e.g., sum values in a column)• Combine columns• Combine two tables (join)• No operations to combine individual cells
– Unlike spreadsheet
HUANG Lihua, Fudan University
Three Basic Operations in a Relational Database
• Select: – Creates subset of rows that meet specific criteria
• Join: – Combines relational tables to provide users with
information
• Project: – Enables users to create new tables containing only
relevant information
HUANG Lihua, Fudan University
Operating on Databases: SQL
• Every abstraction needs an interface through which users invoke abstract operations
– graphical interface– language
• Structured Query Language– Select …(content)…. From… (table)..Where (condition)
• We'll focus only on queries– Query = question– Extract some data from one or more tables to answer a
particular question
HUANG Lihua, Fudan University
Physical vs. Logical Data View• Minimizes these problems by providing two “view
s” of the database data: – The physical view deals with the actual, physical arrang
ement and location of data in the direct access storage devices (DASD).
– The logical view, or user’s view, represents data in a format that is meaningful to a user and to the software programs that process that data.
• Entity-relationship diagram (ER 图) : Methodology for documenting databases illustrating relationships between database entities
• Normalization (范式) : Process of creating small stable data structures from complex groups of data
HUANG Lihua, Fudan University
Experiment: Microsoft Access
• Features:– Create/Modify databases
– Specify/Run queries
– Design/Print reports
– Design graphical user interfaces around databases• Forms for entering, viewing data
• Assignment: P136 APP. Exer 3
P.169 App. Exer 1
HUANG Lihua, Fudan University
Content
• Database Concepts & Technology– Experiment: ACCESS
• Database Trends• Data Trends of Application
– Data Warehouse– OLAP– DATA Mining
• Creating Database Environment
HUANG Lihua, Fudan University
2. Database Trends(1)
• The evolution of Database System
• Data– Simple data => Multimedia data, Knowledge
• Model – Relational model => OO model
Object relational model
HUANG Lihua, Fudan University
Databases Trends (2)
• Application– OLTP => OLAP
• Data organization– Database => Data warehouse, Data Marts
• Query language– SQL => Deductive
HUANG Lihua, Fudan University
Emerging Database Models
The most common database models are: Multimedia database Deductive databases Object-oriented databases Multimedia and hypermedia databases
Multidimensional Database
HUANG Lihua, Fudan University
Object-Oriented Database Model
• Object-oriented (OO) databases store both data and procedures acting on the data, as objects.
• Encapsulation Capability– The OO database can be particularly helpful in multime
dia environments, such as in manufacturing sites using CAD/CAM.
– OO databases can be particularly useful in supporting temporal and spatial( 时空) dimensions.
• Terminology in the OO model includes:– objects, attributes, classes, methods, and messages.
HUANG Lihua, Fudan University
Hypermedia Database Model
The hypermedia database model stores chunks of information in the form of nodes connected by links established by the user.
The nodes can contain text, graphics, sound, full-motion video, or executable computer programs.
Users can branch to related information in any kind of relationship.
HUANG Lihua, Fudan University
Multidimensional Database
• A variation of the relational model• Use multidimensional structures to organize data and e
xpress the relationship between data.• A dimension of the data : a side of a cube.
• ① 多维数组– ( 北京, 1999 年,彩电, 10000)– ( 地理位置,年份,产品类型,销售额 )
• ② 维的层次– 例如:年、季度、月份、日期– 国家、地区、省、城市
• ③ 维内元素的类– 例如: 按产品的价格分成高、中、低档。– 按原材料的成本价格分类
HUANG Lihua, Fudan University
Specialized Databases
• There are many specialized databases, depending on the type or format of data stored. – A geographical information database contains locationa
l data for overlaying on maps or images.
– A knowledge database stores decision rules used to evaluate situations and help users make decisions like an expert.
– A multimedia database stores data on many media—sounds, video, images, graphic animation, and text.
HUANG Lihua, Fudan University
Content
• Database Concepts & Technology– Experiment: ACCESS
• Database Trends• Data Trends of Application
– Data Warehouse– OLAP– DATA Mining
• Creating Database Environment
HUANG Lihua, Fudan University
From Database to Data Warehousin
g• 随着信息技术在企业的广泛应用,企业积累了
大量数据• 企业所面对的问题不是简单地处理数据,而是
如何使用数据 ---- 从操作处理 (Operational Processing) 到
分析处理 (Analytical Processing)
HUANG Lihua, Fudan University
Operational vs. Decision Support Systems
• Operational Processing in Operational Systems– Support day to day transactions– Contain current, “up to date” data– Examples: customer orders, inventory levels, bank
account balances• Analytical Processing in Decision Support
Systems– Support strategic decision making– Contain historical, “summarized” data– Examples: performance summary, customer
profitability, market segmentation
HUANG Lihua, Fudan University
Why data warehouses?
• Decision Support Data– Are found in many different databases
• within the company• outside the company
– In practical terms, locating and integrating all this information in real time is very difficult
• Solution:– Create separate repositories of data for decision
support – => data warehouses
HUANG Lihua, Fudan University
Data Warehousing
• Stores current and historical data – Consolidates data for management analysis and d
ecision making– Supports reporting and query tools
• “ 数据仓库之父” W. H. Inmon 给的定义:数据仓库就是一个用以更好地支持企业或组织的决策分析处理的,面向主题的,集成的,不可更新的,随时间不断变化的数据集合。
HUANG Lihua, Fudan University
Characteristics of Data Warehouses1) Organization. Data are organized by detailed subjects.
2) Consistency. Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner.
3) Time variant. The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time.
4) Non-volatile. Once entered into the warehouse, data are not updated.
5) Relational. The data warehouse uses a relational structure.
6) Client/server. The data warehouse uses the client/server to provide the end user an easy access to its data.
HUANG Lihua, Fudan University
Data Warehouse Suitability Data warehousing is most appropriate for
organizations in which some of the following apply.
Large amounts of data need to be accessed by end-users. The operational data are stored in different systems. An information-based approach to management is in use. There is a large, diverse customer base. The same data are represented differently in different
systems. Data are stored in highly technical formats that are difficult to
decipher. Extensive end-user computing is performed.
HUANG Lihua, Fudan University
Comparison with DB Systems
• DB Systems– 数据库:操作型数据 (O
perational Data) ,增、删、改操作频繁
– 数据库核心:功能强大,面向 OLTP (Online Transa
ction Processing) 应用– 数据库工具:以查询工
具为主
• DWS Systems– 数据仓库:分析型数
据 (Analytical Data) ,极少有更新操作
– 数据仓库管理系统:因极少有更新操作,故功能简单
– 数据仓库工具:以分析工具为主
HUANG Lihua, Fudan University
OLTP在线业务处理
Database
OLAP在线分析处理
DSS决策支持系统
Data Mining数据挖掘
DataWarehouse
HUANG Lihua, Fudan University
数据仓库的实现
取出数据
取出数据
数据转换
数据转换
主题数据
主题 1
主题数据
主题 2
主题数据
主题 3
Infomart
Datamart
Infomart( 信息超市) :
Datamart (数据超市) :
这是一个应用功能(或从应用功能产生的输出)可多次运行数据仓库,它是对应业务问题的答案。
这是数据仓库的一个数据子集,它对应最终用户的信息需求。它比之数据仓库中的数据要更加归纳、汇总一些。
Infomart
Datamart
Data WarehouseOperational Databases
Internal data source
External data source
HUANG Lihua, Fudan University
Data MartsData Marts are an alternative used by many other firms is crea
tion of a lower cost, scaled-down version of a data warehouse. They refer to small warehouses that focus on specific aspects of a company, such as for a strategic business unit (SBU) or a department.
Two major types of Data Marts:1) Replicated (dependent) Data Marts. In such cases one can replicate functional subsets of the data warehouse in smaller databases.2) Stand-Alone Data Marts. A company can have one or more independent data marts without having a data warehouse.
HUANG Lihua, Fudan University
数据仓库的展现
取出数据
取出数据
数据转换
数据转换
主题数据
主题 1
主题数据
主题 2
主题数据
主题 3
Infomart
Datamart
Metadata: 关于数据的数据
查询报表
查询报表
在线分析
在线分析
Operational Databases
HUANG Lihua, Fudan University
数据仓库的数据、信息流
主题表
汇总表
信息
信息
信息
数据确认
数据转换
数据整合
数据汇总
数 据 市
数据取用处理工具在线分析数据挖掘
场
再分析
新展示
新数据集
数据仓库主题 1
OLTP数据库
运行数据定义
Operational Databases
HUANG Lihua, Fudan University
提取、转换、加载 (ETL)(Extract, Transform, Load)
• 数据清理过程• 解决冲突 • 使用效率的考
虑
• Extract– Consolidate data from severa
l sources
• Transform– Filter out unwanted data, corr
ect incorrect data, convert to new data elements, aggregate into new data subsets
• Load– Load into data warehourse
HUANG Lihua, Fudan University
Example of DWS Application1 、贸易中心需求
贸易中心市内卷烟销售采用三级批的方式:
卷烟收购 贸易中心 25 个有限公司 零售商 价格 收购价 调拨价 批发价 集市价
网络零售价
消费者
销售分析
1. 销售分析2. 资源投放分析3. 价格分析4. 有限公司效益预测5. 网络建设信息分析
HUANG Lihua, Fudan University
2 、技术方案: SAS+DWA+IntrNet 的解决方案
2.1 数据仓库逻辑配置方案“ 市内销售分析”主题 (Subject_insale) 中包含两个层次(1) 详细数据( DETAIL TABLE ):(2) 数据集合( DATA GROUP ):1) 根据需求分析需建立五个数据集合:2) 销售量分析集合 (01_QuantyData) ,3) 货源投放分析集合 (02_DistributeData) ,4) 价格分析集合 (03_PriceData) ,5) 效益分析集合 (04_BenefitData) ,6) 网络建设分析集合 (05_SalenetData)
HUANG Lihua, Fudan University
数据仓库逻辑结构图
DW_tobacco_SH
Tobacco_DW
Global_subject
plan
prod
sale
store
Subject_insale
01_QuantyData
(Detail Table 1)
…. (Detail Table n)
Detail Logical Table
02_DistributeData 03_PriceData
04_BenefitData
05_SalenetData
Subject_outsale
01_MarketData
(Detail Table 1)
…
(Detail Table n)
Detail Logical Table
02_DistributeData
03_PriceData
04_SubsaleData
ODD_information_pub
ODD_insale
ODD_outsale
HUANG Lihua, Fudan University
SAS/IntrNet 配置方案: SAS/IntrNet 应用发布示意图
通 用 标 准 B/W 平台
SAS/IntrNet SAS/DWA
Browser
webserver broker
SAS appsrver
WEBPGM( 需开发部分 ) DWA info
HUANG Lihua, Fudan University
销量分析 货源投放 价格分析 效益分析 网建客户信息分析 系统维护
有限公司 1 有限公司 2
时间
销量销量 比较对比 同比
日期 1 1999 年 04 月 01 日日期 2 2000 年 02 月 28 日
统计周期 月
销售性质 所有 牌号 301101 对照表(产地 /代码)
确定 重选
有限公司 所有
销售区域
3 、功能分析
HUANG Lihua, Fudan University
比较复选框
牌号对照表
品牌产地 品牌号 具体牌号
代码输入:
有限公司
销售性质
关注品牌
总计
沪产烟
外地烟
外烟
总计
熊猫
中华
红双喜
。。。
总计
翻盖中华
中华软壳
中华礼包
。。。
HUANG Lihua, Fudan University
货源投放
1 控制牌号投放量分析(系数相关性)
销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护
投放量分析 起始日期 1999 年 09月
截止 截止日期 : 2000 年 02月
确定 重选
销量
集市价格
库存
系数
牌号选择
相关性分析
99/09 99/10 99/11 99/12 00/01 00/02
截止
(分析类别)
HUANG Lihua, Fudan University
2投放进度分析
销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护
投放进度分析
有限公司库存
投放
集市价格
起始日期 2000年 01月 01
日
截止 截止日期 2000年 02月 01日
确定 重选
牌号 (代码)
统计周期 周
01/1 01/2 01/3 01/4 02/1 02/2 02/3 02/4
截止
(分析类别)
有限公司 所有
销量
对照表
HUANG Lihua, Fudan University
3.3 市内销售价格分析 1 价格分析
销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护
价格分析
集市价
集市均价 起始日期 1999 年 09 月01日
截止 截止日期 2000年 02月
确定 重选
牌号
统计周期 月
(代码)
(分析类别)
99/09 99/10 99/11 99/12 00/01 00/02
截止
网络批发价
对照表
HUANG Lihua, Fudan University
2价量分析
销量分析 货源投放 价格分析 效益分析 网建客户相关信息分析 系统维护
价量分析
销售性质
统计周期 周 市内销量
集市价格 起始日期 1999 年 09月
截止 截止日期 2000 年 02月
确定 重选
牌号
(分析类别)
99/09 99/10 99/11 99/12 00/01 00/02
截止
有限公司 所有 条件选定的价格
(代码) 对照表
HUANG Lihua, Fudan University
复旦大学数据管理与数据分析应用
主题数据分析层
共享数据层
数据展现
ROLAP
●●●
共享数据库
人事数据库
人事子系统
学工数据库
学工子系统
教务数据库
教务子系统
科研数据库
科研子系统
主题数据层
●●●
业务数据层
主题数据库
HUANG Lihua, Fudan University
OLAP, DM, KDD
• ON-LINE ANALYTICAL PROCESSING (OLAP在线分析处理 ): Tools for multi- dimensional data analysis
• DATAMINING ( DM, 数据挖掘) : Tools for finding hidden patterns, relationships, for predicting trends
• Knowledge Discovery in Databases (KDD知识发现): Tools for extracting useful knowledge from volumes of data.
HUANG Lihua, Fudan University
Multidimensionality
• Modern data and information may have several dimensions. – e.g. Management may be interested in examining sales figures in a
certain city by product, by time period, by salesperson, and by store.
• It is important to provide the user with a technology that allows him or her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation.
• The technology of slicing(切片) , dicing (切块) , Drilling (钻取) and similar manipulations is called Multidimensionality.
HUANG Lihua, Fudan University
time
Product
Area May 99 Jun 99 Jul 99 Aug 99
P1
P2
P3
P4
P5
AsiaEurope
Americas
Multidimensionality and OLAP
Sale
HUANG Lihua, Fudan University
Multidimensionality= Flexible Analysis
Time series analysis Product comparison
Area Comparsion Special Analysis
HUANG Lihua, Fudan University
Data Mining
• Exciting new set of tools for using data warehouses
• Combination of AI and statistical analysis to discover information that is “hidden” in the data
• OLAP versus Data Mining
HUANG Lihua, Fudan University
Data Mining Analysis Method
• associations – e.g. linking purchase of pizza with beer
• sequences – e.g. tying events together: marriage and purchase
of furniture• classifications
– e.g. recognizing patterns such as the attributes of customers that are most likely to quit
• Forecasting– e.g. predicting buying habits of customers based
on past patterns
HUANG Lihua, Fudan University
Applications of Data Mining
Retailing & Sales Banking
Manufacturing & Production
Brokerage & Securities trading
Computer hardware & software
Insurance Policework Government & Defense Airlines Health care Broadcasting Marketing
Data Mining is currently being used in the following areas;
HUANG Lihua, Fudan University
Knowledge Discovery in Databases (KDD)
• KDD is the process of extracting useful knowledge from volumes of data.
• It is the subject of extensive research.• KDD’s objective is to identify valid, novel, potentially
useful, and ultimately understandable patterns in data.• KDD is useful because it is supported by three
technologies that are now sufficiently mature: – Massive data collection– Powerful multiprocessor computers– Data mining algorithms
HUANG Lihua, Fudan University
Evolution of KDD
Stages in the Evolution of Knowledge Discovery Evolutionary Stage Business Question Enabling Technologies Characteristics
Data Collection (1960s)
What was my total revenue in the last five years?
Computer, tapes, disks. Retrospective, static data delivery
Data Access (1980s) What were unit sales in New England last March?
Relational databases (RDBMS), structured query language (SQL)
Retrospective, dynamic data delivery at record level
Data Warehousing & Decision Support (early 1990s)
Drill down to Boston? Online analytic processing (OLAP), multidimensional databases, data warehouses
Retrospective, dynamic data delivery at multiple levels
Intelligent Data Mining (late 1990s)
What’s likely to happen to Boston unit sales next month? Why?
Advanced algorithms, multiprocessor computers, massive databases
Prospective, proactive information delivery
Source: Courtesy of Accrue Software.
HUANG Lihua, Fudan University
Content
• Database Concepts & Technology– Experiment: ACCESS
• Database Trends• Data Trends of Application
– Data Warehouse– OLAP– DATA Mining
• Creating Database Environment
HUANG Lihua, Fudan University
Key organizational elements in the database environment
Database Management
Systems
Data PlanningAnd ModelingMethodology
Database Technology and
management
Data Administration
Users
HUANG Lihua, Fudan University
Data Administration
• Data Administration– Develop information policy– Define information requirements– Plan for data– Oversee logical database design and database
dictionary development– Monitor use of information
• Database administrator & database analyst
HUANG Lihua, Fudan University
Key organizational elements in the database environment
Database Management
Systems
Data PlanningAnd ModelingMethodology
Database Technology and
management
Data Administration
Users
HUANG Lihua, Fudan University
Management Requirements for Database Systems
• Data Planning and Modeling Methodology– Enterprise-wide planning for data– Identify key entities, attributes, and relationships that constitute the
organization’s data
• Data Planning process– Data planning
• Develop a model of business processes
– Requirements Specification• Define information needs of end users in a business process
– Conceptual design• Expresses all information requirements in the form of high-level model (ERM)
– Logical design• Translates the conceptual model into the data model of a DBMS
– Physical Design• Determines the data storage structure and access method
HUANG Lihua, Fudan University
Procurement Process
PurchasingWhen purchasing department wrote a purchase order, it sent a copy to accounts payables.
VendorThe vendor sent an invoice to accounts payables when they delivered goods.
Receiving When material control received the goods, it sent a copy of the receiving document.
AccountPayables
Thus, the account payables received three documents from various senders : Purchasing order, invoice and receiving document. It was up to account payables to match the purchasing against the receiving document and the invoice. If they matched, the department issued payment. Otherwise, an accounts payable clerk would investigate the discrepancy, hold up payment, generate documents and all in all gum up the works.
HUANG Lihua, Fudan University
Key organizational elements in the database environment
Database Management
Systems
Data PlanningAnd ModelingMethodology
Database Technology and
management
Data Administration
Users
HUANG Lihua, Fudan University
Management Requirements for Database Systems
• Database Technology, Management, and Users– Databases require DBMS software and staff– Database design group defines and organizes
structure and content of database– Database administration: establish physical
database, logical relations, access rules
HUANG Lihua, Fudan University
Wrap-up
• Database Management Systems – Concept, importance– Be able to use simplest ACCESS
• Major types of database– Structure model:
• Hierchical, Network, Rational, Multidimensional, OO – Data type:
• Operational database ~ analytical database ~ Deductive Database• Simple database ~ Multimedia Database ~ Hypermedia Database
– Store type• Centralized Database ~ Distributed Database
• Database development and access• Data warehouse • OLAP, Data mining
HUANG Lihua, Fudan University
Assignment for Session 8-9
• Individual Review for session 8– Reading Materials: Textbook: chapter 5– P. 168: Review Quiz: write down in your book.
• Individual Assignment :– P.136: Application exercises 3– P. 169: Application exercises 1 – submit to Vcampus in two weeks