CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday &...

11
CSCE 5073 Section 001: Data Mining Spring 2016

description

Topic Description Introduction to data mining Know your data Data preprocessing Data warehousing and OLAP Frequent pattern mining, association and correlation Classification Cluster analysis Outlier Detection Advanced topics Deep learning Big data analysis including MapReduce, Spark Social aware data mining

Transcript of CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday &...

Page 1: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

CSCE 5073Section 001: Data Mining

Spring 2016

Page 2: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT

239 Office hour 2:00 – 4:00pm, Tuesday & Thur, JBHT

516 Instructor - Dr. Xintao Wu

email - [email protected] Office – JBHT 516 Webpage http://csce.uark.edu/~xintaowu/5073/5073.htm

Textbook Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining:

Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011. ISBN: 978-0-12-381479-1

Page 3: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Topic Description Introduction to data mining Know your data Data preprocessing Data warehousing and OLAP Frequent pattern mining, association and

correlation Classification Cluster analysis Outlier Detection Advanced topics

Deep learning Big data analysis including MapReduce, Spark Social aware data mining

Page 4: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Course Prerequisite

Data Structure and algorithm Familiarity with programming with Java or

C++ is assumed Matlab/R/Python/Scala is preferred.

Probability and statistics basic concept Knowledge of linear algebra is a big plus

Page 5: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Grading Composition

Homework and quiz 10% Project 30% Midterm 20% Final 40%

Page 6: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Homework and Project Reports Late policy:

No acceptable. Hard copy is preferred Electronic submission (word or pdf)

accepted

Page 7: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Project Data Analysis Project

Each group consists 2-3 students Develop/implement/apply data mining

techniques on real challenging data mining problems

Individual Research Project More information

http://csce.uark.edu/~xintaowu/5073/proj.htm

Page 8: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

Midterm & Final Open books/notes/internet

No discussion No help from any entity, e.g., by

posting/uploading your questions on Web Cumulative No makeup Class attendance is not required

Bonus is expected

Page 9: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

9 9

Textbook & Recommended Reference Books

Textbook Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and

Techniques, 3rd ed., Morgan Kaufmann, 2011 Recommended reference books

C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2007. S. Chakrabarti, Mining the Web: Statistical Analysis of Hypertext and Semi-

Structured Data, Morgan Kaufmann, 2002 T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning:

Data Mining, Inference, and Prediction,2nd ed., Springer-Verlag, 2009. B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,

Springer, 2006 D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning

About a Highly Connected World, Cambridge Univ. Press, 2010. M. Newman, Networks: An Introduction, Oxford Univ. Press, 2010.

Page 10: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

10 10

Reference Papers

Course research papers: Check Reading_List Major conference proceedings that will be used

DM conferences: ACM SIGKDD (KDD), ICDM (IEEE, Int. Conf. Data Mining), SDM (SIAM Data Mining), PKDD (Principles KDD)/ECML, PAKDD (Pacific-Asia)

DB conferences: ACM SIGMOD, VLDB, ICDE ML conferences: NIPS, ICML IR conferences: SIGIR, CIKM Web conferences: WWW, WSDM

Other related conferences and journals IEEE TKDE, ACM TKDD, DMKD, ML,

Use course Web page, DBLP, Google Scholar, Citeseer CS591Han: Advanced Seminar on Data Mining

Page 11: CSCE 5073 Section 001: Data Mining Spring 2016. Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,

11 11

Research Frontiers in Data Mining Mining social and information networks Mining spatiotemporal data, moving object data & cyber-

physical systems Mining multimedia, social media, text and Web Data software engineering and computer system data Multidimensional online analytical analysis Pattern mining, pattern usage, and pattern understanding Biological data mining Stream data mining