9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu...
-
Upload
philomena-clarke -
Category
Documents
-
view
216 -
download
0
Transcript of 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu...
![Page 1: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/1.jpg)
9/03 Data Mining – Introduction
G Dong (WSU) 1
CS499/699-10 Data Mining
Fall 2003 Professor Guozhu Dong
Computer Science & EngineeringWSU
![Page 2: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/2.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 2
Introduction Introduction to this Course Introduction to Data Mining
![Page 3: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/3.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 3
Introduction to the Course First, about you - why take this course?
Your background and strength AI, DBMS, Statistics, Biology, Business, …
Your interests and requests What is this course about?
Problem solving Handling data
transform data to workable data Mining data
turn data to knowledge validation and presentation of knowledge
![Page 4: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/4.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 4
This course What can you expect from this course?
Knowledge and experience about DM Problem solving skills
How is this course conducted? Home works, projects, exams, classes
Course Format Individual Projects: 30% Exams and/or quizzes: 60% Homeworks: 10%
![Page 5: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/5.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 5
Course Web Site cs.wright.edu/~gdong/mining03/
WSUCS499DataMining.htm My office and office hours
RC 430 4:30-5:30, T Th
My email: [email protected] Slides and relevant information will be made
available at the course web site
![Page 6: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/6.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 6
Any questions and suggestions?
Your feedback is most welcome! I need it to adapt the course to your needs. Please feel free to provide yours anytime.
Share your questions and concerns with the class – very likely others may have the same.
No pain no gain – no magic for data mining. The more you put in, the more you get Your grades are proportional to your efforts.
![Page 7: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/7.jpg)
9/03 Data Mining – Introduction
G Dong (WSU) 7
Introduction to Data Mining
DefinitionsMotivations of DM
Interdisciplinary Links of DM
![Page 8: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/8.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 8
What is DM?
Or more precisely KDD (knowledge discovery from databases)? Many definitions An iterative process, not plug-and-play
raw data transformed data preprocessed data data mining post-processing knowledge
One definition is A non-trivial process of identifying valid,
novel, useful and ultimately understandable patterns in data
![Page 9: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/9.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 9
Need for Data Mining Data accumulate and double every 9 months There is a big gap from stored data to
knowledge; and the transition won’t occur automatically.
Manual data analysis is not new but a bottleneck
Fast developing Computer Science and Engineering generates new demands
Seeking knowledge from massive data Any personal experience?
![Page 10: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/10.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 10
When is DM useful
Data rich world Large data (dimensionality and
size) Image data (size) Gene chip data (dimensionality)
Little knowledge about data (exploratory data analysis) What if we have some knowledge?
![Page 11: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/11.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 11
DM perspectives KDD “goals”: Prediction, description,
explanation, optimization, and exploration Knowledge forms: patterns vs. models Understandability and representation of
knowledge Some applications
Business intelligence (CRM) Security (Info, Comp Systems, Networks,
Data, Privacy) Scientific discovery (bioinformatics, medicine)
![Page 12: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/12.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 12
Challenges
Increasing data dimensionality and data size
Various data forms New data types
Streaming data, multimedia data Efficient search and access to
data/knowledge Intelligent update and integration
![Page 13: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/13.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 13
Interdisciplinary Links of DM
Statistics Databases AI Machine Learning Visualization High Performance Computing
supercomputers, distributed/parallel/cluster computing
![Page 14: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/14.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 14
Statistics Discovery of structures or patterns in data sets
hypothesis testing, parameter estimation Optimal strategies for collecting data
efficient search of large databases Static data
constantly evolving data Models play a central role
algorithms are of a major concern patterns are sought
![Page 15: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/15.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 15
Relational Databases
A relational database can contain several tables Tables and schemas
The goal in data organization is to maintain data and quickly locate the requested data Queries and index structures
Query execution and optimization Query optimization is to find the “best” possible
evaluation method for a given query Providing fast, reliable access to data for data
mining
![Page 16: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/16.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 16
AI
Intelligent agents Perception-Action-Goal-Environment
Search Uniform cost and informed search algorithms
Knowledge representation FOL, production rules, frames with semantic
networks Knowledge acquisition Knowledge maintenance and application
![Page 17: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/17.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 17
Machine Learning
Focusing on complex representations, data-intensive problems, and search-based methods
Flexibility with prior knowledge and collected data Generalization from data and empirical validation
statistical soundness and computational efficiency constrained by finite computing & data resources
Challenges from KDD scaling up, cost info, auto data preprocessing,
more knowledge types
![Page 18: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/18.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 18
Visualization Producing a visual display with insights into the
structure of the data with interactive means zoom in/out, rotating, displaying detailed info
Various types of visualization methods show summary properties and explore relationships
between variables investigate large DBs and convey lots of information analyze data with geographic/spatial location
A pre- and post-processing tool for KDD
![Page 19: 9/03 Data Mining – Introduction G Dong (WSU)1 CS499/699-10 Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.](https://reader036.fdocuments.in/reader036/viewer/2022081811/5697bfa01a28abf838c9510a/html5/thumbnails/19.jpg)
9/03 Data Mining – Introduction
Guozhu Dong 19
Bibliography J. Han and M. Kamber. Data Mining – Concepts
and Techniques. 2001. Morgan Kaufmann. D. Hand, H. Mannila, P. Smyth. Principals of
Data Mining. 2001. MIT. W. Klosgen & J.M. Zytkow, edited, 2001,
Handbook of Data Mining and Knowledge Discovery.