6 grisel-scikit-learn-introduction-130228102221-phpapp02

15
scikit-learn Machine Learning in Python Data Tuesday - Feb. 26 2013 - Paris dimanche 24 février 13

description

 

Transcript of 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Page 1: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

scikit-learnMachine Learning in Python

Data Tuesday - Feb. 26 2013 - Paris

dimanche 24 février 13

Page 2: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

• Library of Machine Learning models

• Simple fit / predict / transform API

• Python / NumPy / SciPy / Cython

& wrappers for libsvm / liblinear

• Model Assessment, Selection & Ensembles

• Some support for multi-core

dimanche 24 février 13

Page 3: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Possible Applications

• Text Classification / Sequence Tagging NLP

• Computer Vision / Robotics

• Learning To Rank - IR and advertisement

• Statistical Analysis of the Brain: fMRI / MEG

• Astronomy, Biology, Social Sciences...

dimanche 24 février 13

Page 4: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

dimanche 24 février 13

Page 5: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

dimanche 24 février 13

Page 6: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

dimanche 24 février 13

Page 7: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Example:Training a Model for

Face Recognition

dimanche 24 février 13

Page 8: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Total dataset size:n_samples: 1288, n_features: 1850, n_classes: 7

Extracting the top 150 eigenfaces from 966 facesdone in 0.466s

Projecting the input data on the eigenfaces orthonormal basisdone in 0.056s

Fitting the SVM classifier to the training setdone in 18.549s

Predicting people's names on the test setdone in 0.062s precision recall f1-score support

Ariel Sharon 0.90 0.75 0.82 12 Colin Powell 0.78 0.94 0.85 62 Donald Rumsfeld 0.86 0.72 0.78 25 George W Bush 0.89 0.96 0.92 141Gerhard Schroeder 0.92 0.74 0.82 31 Hugo Chavez 0.90 0.53 0.67 17 Tony Blair 0.81 0.74 0.77 34

avg / total 0.86 0.86 0.86 322

dimanche 24 février 13

Page 9: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

dimanche 24 février 13

Page 10: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Learned Eigen Faces

dimanche 24 février 13

Page 11: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Contributors

• GitHub-centric contribution workflow

• each pull request needs 2 x [+1] reviews

• code + tests + doc + example

• 92% test coverage / Continuous Integr.

• 4 major releases per years + 4 bugfix rel.

• 66 contributors for release 0.13

dimanche 24 février 13

Page 12: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Users

• We support users on & ML

• 200+ questions tagged with [scikit-learn]

• Many competitors + benchmarks

• 500+ answers on ongoing user survey

• 60% academics / 40% from industry

• Some data-drive Startups use sklearn

dimanche 24 février 13

Page 13: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Thank you!

• http://scikit-learn.org - Main Project + doc

• @ogrisel on twitter

• http://ogrisel.com - ML Consultancy (soon)

dimanche 24 février 13

Page 14: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Backup Slides

dimanche 24 février 13

Page 15: 6 grisel-scikit-learn-introduction-130228102221-phpapp02

Caveat Emptor

• Domain specific tooling kept to a minimum

• Some feature extraction for Bag of Words Text Analysis

• Some functions for extracting image patches

• Domain integration is the responsibility of the user or 3rd party libraries

dimanche 24 février 13