Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news...

20
DataCamp Natural Language Processing Fundamentals in Python Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON Katharine Jarmul Founder, kjamistan

Transcript of Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news...

Page 1: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

ClassifyingfakenewsusingsupervisedlearningwithNLP

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 2: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Whatissupervisedlearning?Formofmachinelearning

ProblemhaspredefinedtrainingdataThisdatahasalabel(oroutcome)youwantthemodeltolearnClassificationproblemGoal:Makegoodhypothesesaboutthespeciesbasedongeometricfeatures

SepalLength SepalWidth PetalLength PetalWidth Species

5.1 3.5 1.4 0.2 I.setosa

7.0 3.2 4.77 1.4 I.versicolor

6.3 3.3 6.0 2.5 I.virginica

Page 3: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SupervisedlearningwithNLPNeedtouselanguageinsteadofgeometricfeaturesscikit-learn:Powerfulopen-sourcelibrary

Howtocreatesupervisedlearningdatafromtext?Usebag-of-wordsmodelsortf-idfasfeatures

Page 4: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

IMDBMovieDatasetPlot Sci-Fi Action

Inapost-apocalypticworldinhumandecay,a... 1 0

Moheiisawanderingswordsman.Hearrivesin... 0 1

#137isaSCI/FIthrilleraboutagirl,Marla,... 1 0

Goal:PredictmoviegenrebasedonplotsummaryCategoricalfeaturesgeneratedusingpreprocessing

Page 5: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SupervisedlearningstepsCollectandpreprocessourdataDeterminealabel(Example:Moviegenre)SplitdataintotrainingandtestsetsExtractfeaturesfromthetexttohelppredictthelabel

Bag-of-wordsvectorbuiltintoscikit-learn

Evaluatetrainedmodelusingthetestset

Page 6: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 7: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Buildingwordcountvectorswithscikit-

learn

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 8: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

PredictingmoviegenreDatasetconsistingofmovieplotsandcorrespondinggenreGoal:Createbag-of-wordvectorsforthemovieplots

Canwepredictgenrebasedonthewordsusedintheplotsummary?

Page 9: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

CountVectorizerwithPythonIn[1]:importpandasaspd

In[2]:fromsklearn.model_selectionimporttrain_test_split

In[3}:fromsklearn.feature_extraction.textimportCountVectorizer

In[4]:df=...#LoaddataintoDataFrame

In[5]:y=df['Sci-Fi']

In[6]:X_train,X_test,y_train,y_test=train_test_split(df['plot'],y,test_size=0.33,random_state=53)

In[7]:count_vectorizer=CountVectorizer(stop_words='english')

In[8]:count_train=count_vectorizer.fit_transform(X_train.values)

In[9]:count_test=count_vectorizer.transform(X_test.values)

Page 10: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 11: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Trainingandtestingaclassificationmodelwithscikit-learn

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 12: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

NaiveBayesclassifierNaiveBayesModel

CommonlyusedfortestingNLPclassificationproblemsBasisinprobability

Givenaparticularpieceofdata,howlikelyisaparticularoutcome?Examples:

Iftheplothasaspaceship,howlikelyisittobesci-fi?Givenaspaceshipandanalien,howlikelynowisitsci-fi?

EachwordfromCountVectorizeractsasafeature

NaiveBayes:Simpleandeffective

Page 13: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

NaiveBayeswithscikit-learnIn[10]:fromsklearn.naive_bayesimportMultinomialNB

In[11]:fromsklearnimportmetrics

In[12]:nb_classifier=MultinomialNB()

In[13]:nb_classifier.fit(count_train,y_train)

In[14]:pred=nb_classifier.predict(count_test)

In[15]:metrics.accuracy_score(y_test,pred)Out[15]:0.85841849389820424

Page 14: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

ConfusionMatrix

Action Sci-Fi

Action 6410 563

Sci-Fi 864 2242

In[16]:metrics.confusion_matrix(y_test,pred,labels=[0,1])Out[16]:array([[6410,563],[864,2242]])

Page 15: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

Page 16: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SimpleNLP,ComplexProblems

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON

KatharineJarmulFounder,kjamistan

Page 17: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Translation

(source: )https://twitter.com/Lupintweets/status/865533182455685121

Page 18: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

SentimentAnalysis

(source: )https://nlp.stanford.edu/projects/socialsent/

Page 19: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

LanguageBiases

(relatedtalk: )https://www.youtube.com/watch?v=j7FwpZB1hWc

Page 20: Classifying fake news using supervised learning with NLP · 2017-08-14 · Classifying fake news using supervised learning with NLP NATURAL LANGUAGE PROCESSING FUNDAMENTALS IN PYTHON

DataCamp NaturalLanguageProcessingFundamentalsinPython

Let'spractice!

NATURALLANGUAGEPROCESSINGFUNDAMENTALSINPYTHON