Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin...
-
Upload
carlo-torniai -
Category
Technology
-
view
120 -
download
3
description
Transcript of Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin...
![Page 1: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/1.jpg)
Could You Be a Data Scientist?
Carlo Torniai, Ph.D.@carlotorniai
![Page 2: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/2.jpg)
• Quantify data scientist profiles features • Analyze aspirant data scientist profiles• Provide useful feedback
Goal
?
![Page 3: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/3.jpg)
Why this is relevant?
• A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters
Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg
![Page 4: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/4.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
• Linkedin API:– General Information– Past work history– Education
• Web Scraping:– Skills
• 1500 profiles– Data Scientists– Software Engineer– Business Analysts– Mathematicians– Statisticians
![Page 5: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/5.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Business AnalystsData scientists
Software Engineers
StatisticiansMathematicians
![Page 6: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/6.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Bioi
nfor
mati
cs
Biol
ogy
Com
pute
r Sc
ienc
e
Econ
omic
s
Elec
tron
ics
Astr
onom
y
Mat
h
Neu
rosc
ienc
e
Oth
er
Phys
ics
Psyc
holo
gy
Stat
s
Engi
neer
ing
Number of PhDs by topic and profiles
![Page 7: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/7.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
For the purpose of this project I trained with skills and education features the following models:Random Forest• Classify the profileNaïve Bayes• Multi class probabilities to asses profiles
background componentsK-means• Capability of suggesting similar and relevant profiles
![Page 8: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/8.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
For the purpose of this project I trained with skills and education features the following models:
Model Training set Purpose
Random Forest
All 5 categories Classify the profile
Naïve Bayes 4 classic categories: SE, BA, MT, ST
Asses profile backgrounds components with multi class probabilities
K-means All 5 categories Identify similar profiles
![Page 9: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/9.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
bit.ly/cybads
![Page 10: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/10.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Naïve BayesMulti class probabilities
Random Forest
![Page 11: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/11.jpg)
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
K-meansclustering
![Page 12: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/12.jpg)
Next Steps
Data Collection Data AnalysisFeature Extraction Model Testing Data Product
Get more data:- Other websites- Indeed- User input on
Web app
- Fine grained parsing of education- Experiment with additional features (industry, years of experience)
• Extend feature set and test more models
• Fuzzy C-means
• Add interactive data collection
• Personalized links for skills
• Explanation about similarity results
Close the loop by analyzing job offers and suggest matching profiles
![Page 13: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API.](https://reader036.fdocuments.in/reader036/viewer/2022062617/54c6fa3c4a7959051f8b45c1/html5/thumbnails/13.jpg)
Thank you!
Technologies
Web App: Flask, jQuery, Vega, MongoDB
NMF, HC, RF ,DT, NB, K-means models:: scikit-learn
Visualizations:Vincent, Vega, NetworkX, Gephi
Acknowledgementyatish27 : Ruby Linkedin public profile Web Scraperozgut : Linkedin API Python wrapper