DSR 09 Demo day presentation - Karthick Perumal
-
Upload
karthick-perumal -
Category
Data & Analytics
-
view
348 -
download
3
Transcript of DSR 09 Demo day presentation - Karthick Perumal
What do you call a fish with no eyes?
What do you call a fish with no eyes?
Instilling a Sense of Humourin Computers
Dr. Karthick Perumal
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 4
Outline
● Language modeling
● Choice of algorithm
● Creating a Joke dataset
● Training model
● Generated output
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 5
Language Modeling
Probability distribution over sequences of words
P( Data science is the future )
P (Data science is the future)=P (Data)x P (science∣Data) xP (is∣Data science )xP (the∣Data science is)xP ( future∣Data science is the)
> P( Data science is the Berlin )
> P( Data science ist die Zukunft)
> P(Data Science is the Zukunft)
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 6
Why Language Modeling?
How to wreck a nice beach or How to recognize speech
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 7
Why Language Modeling?
● Speech Recognition
P(How to recognize speech) > P(How to wreck a nice beach)
● Spelling correction/prediction
P(win a contest) > P(win a context)
● Machine Translation
P(give a high five) > P(give a large five)
● Text summarization, question-answering, etc.,
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 8
Choice of Algorithm?
● Recurrent neural networks- have feedback loops allowing the network to use information
from previous passes, which act as memory- We specifically use LSTM (Long short term memory), which solves vanishing gradient problem
● Extremely efficient for language modeling and timeseries analysis
● Computationally expensive and longer training times
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 9
Creating a Joke Dataset
● Extracted jokes only with good rating
● Lot of redundant jokes from various websites
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 10
Interesting information about the dataset
● 310967 jokes: including duplicates, inappropriate words
● 219873 cleaned jokes
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 11
Interesting information about the dataset
● Found some redundant jokes after cleaning
● What do you call a fish with no eye? Fsh.
● What do you call a fish with no eyes? A fsh.
● What do you call a fish with no eyes? A fsh. What do you call a fish
with four eyes? NEEEERRRRD
● Meaningless text also scraped and available in the dataset
● "Hey whatcha eating ? "A pluot" Wtf is a pluot ? "A cross between
a plum & an apricot" That 's really stupid. rides off on a liger"
● Alfijnbahkfnbsbbakrbbjdnebzk hzueonyvag macarena yrvixndvwhkga
ndhwkdbcbe hayvektoubabrjnahor HEYYYY MACARENA
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 12
Hyperparameter Tuning
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 13
Summary● What do you call a cow with no eyes?
● I have a problem with my mom. It's gonna be so great
● What do you call a Mexican who runs for Christmas? A secret enemy.
● Why did the blonde stare at her windows for hours? First she liked it.
● A zombie walks into a bar. the bartender says, “Hey, we don't serve food in here”.
● I was going to make a joke about the movie Titanic, but I didn't want to go on.
● How many hipsters does it take to change a light bulb? Only onebut I have no idea how they got in there
Disappointment
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 14
Thanks
Any Questions?
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 15
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 16
Model pipeline
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 17
State of NLP
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 18
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 19
Language Modeling
Probability distribution over sequences of words
P( Data science is the future )
P(w1w2 ...wn)=∏i
P(wi∣w1w2 ...wi−1)
P(A ,B ,C , D)=P (A) x P(B∣A) x P(C∣A ,B) x P(D∣A ,B ,C)
> P( Data science is the Berlin )
> P( Data science ist die Zukunft )
> P(Data Science is the Zukunft)
Karthick Perumal | Instilling a Sense of Humour in Computers | April 7th, 2017 | Page 20
Long Short Term Memory
1) Forget gate layer2) Input gate layer3) Tanh layer to update new candidate value4) output information relevant to the subject