6 grisel-scikit-learn-introduction-130228102221-phpapp02
-
Upload
gtllsystematic -
Category
Documents
-
view
356 -
download
0
description
Transcript of 6 grisel-scikit-learn-introduction-130228102221-phpapp02
scikit-learnMachine Learning in Python
Data Tuesday - Feb. 26 2013 - Paris
dimanche 24 février 13
• Library of Machine Learning models
• Simple fit / predict / transform API
• Python / NumPy / SciPy / Cython
& wrappers for libsvm / liblinear
• Model Assessment, Selection & Ensembles
• Some support for multi-core
dimanche 24 février 13
Possible Applications
• Text Classification / Sequence Tagging NLP
• Computer Vision / Robotics
• Learning To Rank - IR and advertisement
• Statistical Analysis of the Brain: fMRI / MEG
• Astronomy, Biology, Social Sciences...
dimanche 24 février 13
dimanche 24 février 13
dimanche 24 février 13
dimanche 24 février 13
Example:Training a Model for
Face Recognition
dimanche 24 février 13
Total dataset size:n_samples: 1288, n_features: 1850, n_classes: 7
Extracting the top 150 eigenfaces from 966 facesdone in 0.466s
Projecting the input data on the eigenfaces orthonormal basisdone in 0.056s
Fitting the SVM classifier to the training setdone in 18.549s
Predicting people's names on the test setdone in 0.062s precision recall f1-score support
Ariel Sharon 0.90 0.75 0.82 12 Colin Powell 0.78 0.94 0.85 62 Donald Rumsfeld 0.86 0.72 0.78 25 George W Bush 0.89 0.96 0.92 141Gerhard Schroeder 0.92 0.74 0.82 31 Hugo Chavez 0.90 0.53 0.67 17 Tony Blair 0.81 0.74 0.77 34
avg / total 0.86 0.86 0.86 322
dimanche 24 février 13
dimanche 24 février 13
Learned Eigen Faces
dimanche 24 février 13
Contributors
• GitHub-centric contribution workflow
• each pull request needs 2 x [+1] reviews
• code + tests + doc + example
• 92% test coverage / Continuous Integr.
• 4 major releases per years + 4 bugfix rel.
• 66 contributors for release 0.13
dimanche 24 février 13
Users
• We support users on & ML
• 200+ questions tagged with [scikit-learn]
• Many competitors + benchmarks
• 500+ answers on ongoing user survey
• 60% academics / 40% from industry
• Some data-drive Startups use sklearn
dimanche 24 février 13
Thank you!
• http://scikit-learn.org - Main Project + doc
• @ogrisel on twitter
• http://ogrisel.com - ML Consultancy (soon)
dimanche 24 février 13
Backup Slides
dimanche 24 février 13
Caveat Emptor
• Domain specific tooling kept to a minimum
• Some feature extraction for Bag of Words Text Analysis
• Some functions for extracting image patches
• Domain integration is the responsibility of the user or 3rd party libraries
dimanche 24 février 13