Machine Learning: Learning with data

40
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. André Lourenço Instituto Superior de Engenharia de Lisboa, Instituto de Telecomunicações, Instituto Superior Técnico, Lisbon, Portugal Machine Learning Learning with Data 10/11/2011 - ONE Talks

description

 

Transcript of Machine Learning: Learning with data

  • 1. 10/11/2011 - ONE Talks Machine LearningLearning with DataAndr LourenoInstituto Superior de Engenharia de Lisboa,Instituto de Telecomunicaes,Instituto Superior Tcnico, Lisbon, Portugal 2005, it - instituto de telecomunicaes. Todos os direitos reservados.

2. Outline Introduction Examples What does it mean to learn? Supervised and Unsupervised Learning Types of Learning Classification Problem Text Mining Example Conclusions (and further reading)2 3. Introduction 3 4. What is Machine Learning? A branchof artificialintelligence (AI) Arthur Samuel (1959)Field of study that givescomputers the ability tolearn without being explicitlyprogrammedFrom: Andrew NG Standford Machine Learning Classes http://www.youtube.com/watch?v=UzxYlbK2c7E 4 09-11-2011 5. What is Machine Learning? TomMitchell (1998) Well-posed LearningProblem:A computer program is said to learn fromexperience E with respect to some class oftasks T and performance measure P, if itsperformance at tasks in T, as measured by P,improves with experience E. Mark DredzeTeaching a computer about the world 5 09-11-2011 6. What is Machine Learning? Goal:Design and development of algorithms that allowcomputers to evolve behaviors based onempirical data, such as from sensor data ordatabases How to apply machine Learning? Observe the world Develop models that match observations Teach computer to learn these models Computer applies learned model to the world 6 09-11-2011 7. Example 1:Prediction of House Price From: Andrew NG Standford Machine Learning Classeshttp://www.youtube.com/watch?v=UzxYlbK2c7E7 09-11-2011 8. Example 2:Learning to automatically classify text documents From: http://www.xcellerateit.com/8 09-11-2011 9. Example 3:Face Detection and Tracking http://www.micc.unifi.it/projects/optimal-face-detection-and-tracking/ 9 09-11-2011 10. Example 4:Social Network Mining Users ProfileFriendship Group & Network U3 U1 U5 Hidden Information ?U2 U4From: Exploit of Online Social Networks with Community-BasedGraph Semi-Supervised Learning, Mingzhen Mo and Irwin King GroupICONIP 2010, Sydney, Australia Network10 09-11-2011 11. Example 5:Biometric Systems 1. Physical 2. Behavioral 11 09-11-2011 12. WHAT DOES IT MEAN TOLEARN?12 09-11-2011 13. What does it mean to learn? Learn patterns in datazDecision System z : observed signal Estimated output 13 09-11-2011 14. Unsupervised Learning Look for patterns in data No training Data (no examples of output) Pro: No labeling of examples for output Con: Cannot demonstrate specific types of output Applications: Data mining Finds interesting patterns in data From: Mark Dredze Machine Learning - Finding Patterns in the World 14 09-11-2011 15. Supervised Learning Learn patterns to simulate given output Pro: Can learn complex patterns Good performance Con: Requires many examples of output for examples Applications: Classification Sorts data into predefined groups From: Mark Dredze Machine Learning - Finding Patterns in the World 15 09-11-2011 16. Types of Learning: Output Classification Binary, multiclass, multilabel, hierarchical, etc. Classify email as spam Loss: accuracy Ranking Order examples by preference Rank results of web search Loss: Swapped pairs Regression Realvalued output Predict the price of tomorrows stock price Loss: Squared loss Structured prediction Sequences, trees, segmentation Find faces in an image Loss: Precision/Recall of faces From: Mark Dredze 16 09-11-2011 Machine Learning - Finding Patterns in the World 17. Classification Problem Classical ArchitecturezFeature y ClassificationExtractionz : observed signaly : feature vector (pattern) y S Estimated output (class) {1,2,,c}17 09-11-2011 18. Classification Problem Example with 1 feature Problem: classify people in non-obese or obese byobservation of its weight (only 1 feature) Is it possible to classify without without making any mistakes?1818 19. Classification Problem Example with 2 features zFeature y = {weight, = non-obese Classification Extraction Height} or obese z : observed signal y : feature vector (pattern)y S Estimated output (class) {1: non-obese, 2: obese} 19 09-11-2011 20. Classification Problem Example with 2 feature Problem: classify people in non-obese or obese byobservation of its weight and height Now the decision appears more simple!2020 21. Classification Problem Example with 2 feature Problem: classify people in non-obese or obese byobservation of its weight and height Regies de deciso: R1 : non-obese; R2 : obese 2121 22. Classification Problem Decision Regions Goal of the classifier: define a partition of the feature space withc disjoint regions, called decision regions: : R1, R2, , Rc 22 22 23. TEXT MINING EXAMPLE23 09-11-2011 24. Text Mining ProcessAdapted from: Introduction to Text Mining, Yair Even-Zohar, University of Illinois 24 09-11-2011 25. Text Mining ProcessText preprocessing Syntactic/Semantic textanalysisFeatures Generation Bag of wordsFeatures Selection Simple counting StatisticsText/Data Mining Classification- Supervisedlearning Clustering- UnsupervisedlearningAnalyzing results25 09-11-2011 26. Syntactic / Semantic text analysis Part Of Speech (pos) tagging Find the corresponding pos for each worde.g., John (noun) gave (verb) the (det) ball (noun) Word sense disambiguation Context based or proximity based Parsing Generates a parse tree (graph) for each sentence Each sentence is a stand alone graph26 09-11-2011 27. Feature Generation: Bag of words Text document is represented by the words itcontains (and their occurrences) e.g., Lord of the rings {the, Lord, rings, of} Highly efficient Makes learning far simpler and easier Order of words is not that important for certain applications Stemming: identifies a word by its root e.g., flying, flew fly Reduce dimensionality Stop words: The most common words are unlikelyto help text mining e.g., the, a, an, you 27 09-11-2011 28. Example Hi, Here is your weekly update (that unfortunately hasnt gone out in about a month). Not much action here right now. 1) Due to the unwavering insistence of a member of the group, the ncsa.d2k.modules.core.datatype package is month).hi, weekly update (that unfortunately gone out now completely independent of now. d2k application.much action here right the 1) due unwavering insistence 2) Transformations are now handled differently in Tables. packagemember group, ncsa.d2k.modules.core.datatype Previously, transformations were done using anow completely independent d2k application. 2) TransformationModule. That handled could thentables. previously,transformations now module differently be added to a list that an ExampleTable kept. transformationmodule. moduletransformations done using Now, there is an interfaceadded list exampletable kept. sub-interface called called Transformation and a now, interface called ReversibleTransformation. unfortunate go out month much action herehi week updatetransformation sub-interface calledright now 1 due unwaver insistence member group ncsareversibletransformation.d2k modules core datatype package now completeindependence d2k application 2 transformation now handledifferent table previous transformation do usetransformationmodule module add list exampletable keepnow interface call transformation sub-interface callreversibletransformation 28 09-11-2011 29. Feature Generation: Weighting Term FrequencyBag of Words Lorem 1 term ti, document dj dolor 1 Praesent1 Inverse Document FrequencyLorem ipsum dolor sit amet, consectetuer adipiscing elit. Praesent iaculis 1et quam sit amet diamporttitor iaculis. Vestibulum ante ipsum Vestibulum1primis in faucibus orciluctus et ultrices posuere ipsum 2 cubilia Curae; consectetuer2 TF-IDF 29 09-11-2011 30. Feature Generation: Vector Space Model Documents as vectors 30 09-11-2011 31. Feature Selection Reduce dimensionality Learners have difficulty addressing tasks with highdimensionality Irrelevant features Not all features help! e.g., the existence of a noun in a news article is unlikely to help classify it as politics or sport Stop Words Removal 31 09-11-2011 32. Examplehicoreweekdatatypeupdatepackageunfortunate completegoindependenceout applicationmonth 2 hi domuchtransformationweek coreactionhandleupdate datatypeheredifferent unfortunatepackageright table go completenow previousoutindependence1 use month hi application datatypedue muchtransformationmodule transformationweek packageunwaver add action handleupdate completeinsistencelisthere differentunfortunate independencememberexampletable right tablemonthapplicationgroup keepnowpreviousncsainterface due action use transformationd2k callinsistenceright add handlemodules sub-interface memberduelistdifferentdogroupreversibletransformation keepinsistence tablencsa interfacemember previousd2kcallgroupaddmodulessub-interfacencsa listd2kinterfacemodulescallcore sub-interface32 09-11-2011 33. Document Similarity Dot Product cosinesimilarity 33 09-11-2011 34. Text Mining: Classification definition Given: a collection of labeled records (training set) Each record contains a set of features (attributes), and the true class (label) Find: a model for the class as a function of the values of the features Goal: previously unseen records should be assigned a class as accurately as possible A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it34 09-11-2011 35. Text Mining: Clustering definition Given: a set of documents and a similarity measure among documents Find: clusters such that: Documents in one cluster are more similar to one another Documents in separate clusters are less similar to one another Goal: Finding a correct set of documents35 09-11-2011 36. Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations,measurements, etc.) are accompanied by labelsindicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with theaim of establishing the existence of classes or clusters inthe data 36 09-11-2011 37. CONCLUDING REMARKS37 09-11-2011 38. Readings Survey Books in Machine Learning The Elements of Statistical Learning Hastie, Tibshirani, Friedman Pattern Recognition and Machine Learning Bishop Machine Learning Mitchell Questions? 38 09-11-2011 39. ACKNOWLEDGEMENTS ISEL DEETC Final year and MSc supervised students (Tony Tam, ...) Students of Digital Signal Processing Artur Ferreira Instituto Telecomunicaes (IT) David Coutinho, Hugo Silva, Ana Fred,Mrio Figueiredo Fundao para a Cincia e Tecnologia (FCT) 39 09-11-2011 40. www.it.ptThank you for the attention!Andr Ribeiro LourenoMail to: [email protected] [email protected]