CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture...
Transcript of CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture...
![Page 1: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/1.jpg)
CSE 4334/5334
DATA MINING
CSE4334/5334 Data Mining, Fall 2014
Department of Computer Science and Engineering, University of Texas at Arlington
©Chengkai Li, 2014
Lecture 1: Introduction
![Page 2: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/2.jpg)
Self Introduction
Naeemul Hassan
http://idir.uta.edu/~naeemul/
Research interests:
Database Systems
Data Mining
Computational Journalism
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 2
![Page 3: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/3.jpg)
My Research
Research Overview
Skyline Group
Computational Journalism
Crowdsourcing
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 3
![Page 4: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/4.jpg)
Now it’s your turn
o Name, program
o Prior courses/experiences related to this
subject
o What make you decide to take this course?
o What will make you like/hate this course?
o Anything else
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 4
![Page 5: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/5.jpg)
Course Page
http://idir.uta.edu/~naeemul/cse4334/
Syllabus, Schedule (lecture notes), Resources,
Accommodation based on disability.
Blackboard
Announcement (check it on a daily basis)
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 5
![Page 6: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/6.jpg)
Basics
Lectures: Tue/Thu, 2-3:20pm, WH 308
Instructor: Naeemul Hassan
Office hours: Tue/Thu 10:00am-12:00pm, ERB 509
Contact: naeemul DOT hassan AT mavs DOT uta DOT edu, (817) 437-4518
(I do not check voicemails regularly.)
TA: TBD
Office hours: TBD
Email: TBD
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 6
![Page 7: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/7.jpg)
Textbook
Required Textbook:
Jiawei Han, Micheline Kamber and Jian Pei . Data Mining: Concepts and Techniques, 3rd ed.
(2nd edition is also fine), Morgan Kaufmann Publishers, June 2011. ISBN 9780123814791
Reference:
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining,
Addison-Wesley, 2006. ISBN 0-321-32136-7.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to
Information Retrieval, Cambridge University Press. 2008. (This book is available online
at http://nlp.stanford.edu/IR-book/)
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations, Morgan Kaufmann, 2nd ed. 2005.
T. M. Mitchell, Machine Learning, McGraw Hill, 1997.
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 7
![Page 8: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/8.jpg)
The slides
The slides highlight the gist of most important concepts and techniques.
But
It is not meant to be complete. Details may not be included.
It may be simplified for ease of explanation.
Studying only the slides is not enough.
You need to read the book and study the slides carefully.
Many lecture notes are adopted from:
Jiawei Han (Illinois)
Vipin Kumar (Minnesota)
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 8
![Page 9: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/9.jpg)
Tentative Grading Scheme
Midterm 20%
Final 30%
Homework (HW) 20% (Must be done independently)
Course Project 30% (Must be done independently)
You are required to attend classes and actively participate in discussions.
Final Letter Grade:
No pre-defined cutoffs. Will be based on bell curve of your performance.
Undergraduate and graduate students are compared in separate groups.
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 9
![Page 10: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/10.jpg)
Homework (HW)
Problem solving
Focus on most important topics
HW1, HW2, HW3, HW4, 5% each
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 10
![Page 11: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/11.jpg)
Course Project
2 Programming Assignments, 15% each
hands-on experience with big data, real application
Must design, implement (programming), and evaluate
open to novel solutions
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 11
![Page 12: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/12.jpg)
Blackboard
Assignment instruction and files
Submission (we don’t accept email submission or
hard-copy)
Grades
Questions, Discussion Group
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 12
![Page 13: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/13.jpg)
Deadlines
Everything will be submitted through Blackboard.
Due time: 11:59pm
Late submission: 5-point deduction per hour, till you
get 0. (The raw score of each assignment is 100. So
there is no point to submit it after 20 hours).
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 13
![Page 14: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/14.jpg)
Regrading
7 days after we post scores on Blackboard. TA will
handle regrade requests. Won’t consider it after 7
days.
If not satisfied with the results, 7 days to request
again. Instructor will handle it, and the decision is
final.
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 14
![Page 15: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/15.jpg)
Topics in Textbook
Part 1: Introduction
Data Preprocessing
Data Warehouse and OLAP Technology: An Introduction
Advanced Data Cube Technology and Data Generalization
Mining Frequent Patterns, Association and Correlations
Classification and Prediction
Cluster Analysis
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 15
![Page 16: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/16.jpg)
Topics in Textbook
Part 2: Advanced Applications and Current Research
Mining data streams, time-series, and sequence data
Mining graphs, social networks and multi-relational data
Mining object, spatial, multimedia, text and Web data
Mining complex data objects
Spatial and spatiotemporal data mining
Multimedia data mining
Text mining
Web mining
Applications and trends of data mining
Mining business & biological data
Visual data mining
Data mining and society: Privacy-preserving data mining
Additional themes (prominent streak discovery, skyline group, significant fact finding)
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 16
![Page 17: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/17.jpg)
Schedule
http://idir.uta.edu/~naeemul/cse4334/
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 17
![Page 18: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/18.jpg)
Your Email
Make sure your MavMail works. We will only
contact you by your MavMail.
Check it on a daily basis.
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li, 2014 18
![Page 19: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/19.jpg)
Academic Integrity
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2013, UT-Arlington ©Chengkai Li, 2013 19
Cheating
Copying another's test or assignment
Communication with another during an exam or
assignment (i.e. written, oral or otherwise)
Giving or seeking aid from another when not permitted
by the instructor
Possessing or using unauthorized materials during the
test
Buying, using, stealing, transporting, or soliciting a test,
draft of a test, or answer key
![Page 20: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/20.jpg)
Academic Integrity
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2013, UT-Arlington ©Chengkai Li, 2013 20
Plagiarism
Using someone else's work in your assignment without
appropriate acknowledgement
Making slight variations in the language and then
failing to give credit to the sourc
![Page 21: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/21.jpg)
Academic Integrity
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2013, UT-Arlington ©Chengkai Li, 2013 21
Collusion
Without authorization, collaborating with another when
preparing an assignment
![Page 22: CSE5334 Data Miningidir.uta.edu/~naeemul/cse4334/slides/cse5334-fall14-01.pdfCrowdsourcing Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2014, UT-Arlington ©Chengkai Li,](https://reader036.fdocuments.in/reader036/viewer/2022063018/5fddc4dc18663a41245240aa/html5/thumbnails/22.jpg)
Question?
Lecture 1: Introduction CSE4334/5334 Data Mining, Fall 2013, UT-Arlington ©Chengkai Li, 2013 22