NLP& Bigdata. Motivation and Action

124
NLP & Bigdata Motivation and Action Sarath P R [email protected] IIIT-MK Thiruvananthapuram November 09, 2013 Sarath P R [email protected] NLP & Bigdata Motivation and Action

description

 

Transcript of NLP& Bigdata. Motivation and Action

Page 1: NLP& Bigdata. Motivation and Action

NLP & BigdataMotivation and Action

Sarath P [email protected]

IIIT-MKThiruvananthapuram

November 09, 2013

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 2: NLP& Bigdata. Motivation and Action

About me

Working as Technical Lead - Bigdata

Like to develop software applications for good reasons

Independent Data Journalist at DScribe.IN

Holds Masters in Computer Science

Like to travel and meet people

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 3: NLP& Bigdata. Motivation and Action

Agenda

Introduction

Full text Search and Index

Document Clustering

Representing Data

Stanford NLP

R and Weka

Social Media and Sentiment Analysis

Introduction to Bigdata

Current Trends

Conclusion

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 4: NLP& Bigdata. Motivation and Action

Introduction

Sorry !!! No Definitions copied here for NLP !

In case you need a definition tell me. Otherwise we will ’see’now what is NLP !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 5: NLP& Bigdata. Motivation and Action

Introduction

Sorry !!! No Definitions copied here for NLP !

In case you need a definition tell me. Otherwise we will ’see’now what is NLP !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 6: NLP& Bigdata. Motivation and Action

Introduction - 2 minutes Targit Video

Watch Targit Video Here http://youtu.be/32KE0rbGZ9c

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 7: NLP& Bigdata. Motivation and Action

So What is He (Targit CTO) Saying ?

“Calling your system, and getting delivered an analysis is rightaround the corner”

Go to Targit’s website http://targit.com. You will see aLion standing in the front page

They say “Targit is a courage Company”

That was all about Motivation. No hidden agenda to promoteTargit !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 8: NLP& Bigdata. Motivation and Action

So What is He (Targit CTO) Saying ?

“Calling your system, and getting delivered an analysis is rightaround the corner”

Go to Targit’s website http://targit.com. You will see aLion standing in the front page

They say “Targit is a courage Company”

That was all about Motivation. No hidden agenda to promoteTargit !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 9: NLP& Bigdata. Motivation and Action

So What is He (Targit CTO) Saying ?

“Calling your system, and getting delivered an analysis is rightaround the corner”

Go to Targit’s website http://targit.com. You will see aLion standing in the front page

They say “Targit is a courage Company”

That was all about Motivation. No hidden agenda to promoteTargit !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 10: NLP& Bigdata. Motivation and Action

So What is He (Targit CTO) Saying ?

“Calling your system, and getting delivered an analysis is rightaround the corner”

Go to Targit’s website http://targit.com. You will see aLion standing in the front page

They say “Targit is a courage Company”

That was all about Motivation. No hidden agenda to promoteTargit !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 11: NLP& Bigdata. Motivation and Action

So What is He (Targit CTO) Saying ?

“Calling your system, and getting delivered an analysis is rightaround the corner”

Go to Targit’s website http://targit.com. You will see aLion standing in the front page

They say “Targit is a courage Company”

That was all about Motivation. No hidden agenda to promoteTargit !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 12: NLP& Bigdata. Motivation and Action

Introduction - Innovation

What we just saw is one aspect of NLP

What is it ?

It is Speech Recognition and Analytics

And what they did ?

It is Innovation !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 13: NLP& Bigdata. Motivation and Action

Introduction - Innovation

What we just saw is one aspect of NLP

What is it ?

It is Speech Recognition and Analytics

And what they did ?

It is Innovation !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 14: NLP& Bigdata. Motivation and Action

Introduction - Innovation

What we just saw is one aspect of NLP

What is it ?

It is Speech Recognition and Analytics

And what they did ?

It is Innovation !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 15: NLP& Bigdata. Motivation and Action

Introduction - Innovation

What we just saw is one aspect of NLP

What is it ?

It is Speech Recognition and Analytics

And what they did ?

It is Innovation !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 16: NLP& Bigdata. Motivation and Action

Introduction - Innovation

What we just saw is one aspect of NLP

What is it ?

It is Speech Recognition and Analytics

And what they did ?

It is Innovation !

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 17: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 18: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 19: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 20: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 21: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 22: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 23: NLP& Bigdata. Motivation and Action

Introduction - Search Engines & Information Retrieval

Tell me your opinion. Question follows

IS Google an NLP Company ?

Yes, they are. Biggest one !

So, how google works ? I mean the Search Engine !

From where they bring you the search results ?

Answer is 3 things. Crawler, Index and Algorithms

Now we will start with few NLP, Machine Learning and Analyticsrelated topics in detail

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 24: NLP& Bigdata. Motivation and Action

Full text Search and Inverted Index

In information retrieval, full-text search refers to techniques forsearching a single computer-stored document or a collection in afull text database

When the number of documents to search is potentially large, orthe quantity of search queries to perform is substantial, theproblem of full-text search is often divided into two tasksIndexing and Searching

The indexing stage will scan the text of all the documents andbuild a list of search terms, called an indexIn the search stage, when performing a specific query, only theindex is referenced, rather than the text of the original documents

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 25: NLP& Bigdata. Motivation and Action

Full text Search and Inverted Index

In information retrieval, full-text search refers to techniques forsearching a single computer-stored document or a collection in afull text database

When the number of documents to search is potentially large, orthe quantity of search queries to perform is substantial, theproblem of full-text search is often divided into two tasksIndexing and Searching

The indexing stage will scan the text of all the documents andbuild a list of search terms, called an indexIn the search stage, when performing a specific query, only theindex is referenced, rather than the text of the original documents

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 26: NLP& Bigdata. Motivation and Action

Full text Search and Inverted Index

In information retrieval, full-text search refers to techniques forsearching a single computer-stored document or a collection in afull text database

When the number of documents to search is potentially large, orthe quantity of search queries to perform is substantial, theproblem of full-text search is often divided into two tasksIndexing and Searching

The indexing stage will scan the text of all the documents andbuild a list of search terms, called an indexIn the search stage, when performing a specific query, only theindex is referenced, rather than the text of the original documents

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 27: NLP& Bigdata. Motivation and Action

Inverted index

It is the most popular data structure used in documentretrieval systems

Similar to the index in the back of a book

Used on a large scale for example in search engines

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 28: NLP& Bigdata. Motivation and Action

Inverted index

1

1Reference http://nlp.stanford.edu/IR-book/html/htmledition/

a-first-take-at-building-an-inverted-index-1.html

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 29: NLP& Bigdata. Motivation and Action

Index vs Inverted Index

Index

A forward index (or just index) is the list of documents, and whichwords appear in them

Inverted Index

The inverted index is the list of words, and the documents in whichthey appear

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 30: NLP& Bigdata. Motivation and Action

Index vs Inverted Index

Index

A forward index (or just index) is the list of documents, and whichwords appear in them

Inverted Index

The inverted index is the list of words, and the documents in whichthey appear

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 31: NLP& Bigdata. Motivation and Action

Exercise

Have a look at the table below

Document WordsDoc 1 talk, iiitmk, campus,nlpDoc 2 algorithm, bigdata, nlpDoc 3 researchers, talk

What kind of an Index is it ?

Create an inverted index from this forward index

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 32: NLP& Bigdata. Motivation and Action

Exercise

Have a look at the table below

Document WordsDoc 1 talk, iiitmk, campus,nlpDoc 2 algorithm, bigdata, nlpDoc 3 researchers, talk

What kind of an Index is it ?

Create an inverted index from this forward index

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 33: NLP& Bigdata. Motivation and Action

Exercise

Have a look at the table below

Document WordsDoc 1 talk, iiitmk, campus,nlpDoc 2 algorithm, bigdata, nlpDoc 3 researchers, talk

What kind of an Index is it ?

Create an inverted index from this forward index

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 34: NLP& Bigdata. Motivation and Action

Exercise

Have a look at the table below

Document WordsDoc 1 talk, iiitmk, campus,nlpDoc 2 algorithm, bigdata, nlpDoc 3 researchers, talk

What kind of an Index is it ?

Create an inverted index from this forward index

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 35: NLP& Bigdata. Motivation and Action

Answer

Inverted Index

Words Documenttalk Doc 1, Doc 3iiitmk Doc 1campus Doc 1nlp Doc 1, Doc 2algorithm Doc 2bigdata Doc 2researchers Doc 3

Search

A search query like ’nlp talk’ would deliver what results ?

Result

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 36: NLP& Bigdata. Motivation and Action

Answer

Inverted Index

Words Documenttalk Doc 1, Doc 3iiitmk Doc 1campus Doc 1nlp Doc 1, Doc 2algorithm Doc 2bigdata Doc 2researchers Doc 3

Search

A search query like ’nlp talk’ would deliver what results ?

Result

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 37: NLP& Bigdata. Motivation and Action

Answer

Inverted Index

Words Documenttalk Doc 1, Doc 3iiitmk Doc 1campus Doc 1nlp Doc 1, Doc 2algorithm Doc 2bigdata Doc 2researchers Doc 3

Search

A search query like ’nlp talk’ would deliver what results ?

Result

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 38: NLP& Bigdata. Motivation and Action

Apache Lucene Demo

Which Tool to try for indexing ans searching ?

Apache Lucene is a full-featured text search engine library

Written entirely in Java

Open Source

Scalable and High Performance Indexing

Powerful, Accurate and Efficient Search Algorithms

Interesting Features of Lucene Core

Allows Simultaneous update and searching

Powerful query types like phrase queries, wildcard queries,range queries etc

Fielded searching (e.g. title, author, contents)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 39: NLP& Bigdata. Motivation and Action

Apache Lucene Demo

Which Tool to try for indexing ans searching ?

Apache Lucene is a full-featured text search engine library

Written entirely in Java

Open Source

Scalable and High Performance Indexing

Powerful, Accurate and Efficient Search Algorithms

Interesting Features of Lucene Core

Allows Simultaneous update and searching

Powerful query types like phrase queries, wildcard queries,range queries etc

Fielded searching (e.g. title, author, contents)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 40: NLP& Bigdata. Motivation and Action

Document Clustering

Definition

The process of grouping a set of physical or abstract objects intoclasses of similar objects is called clustering.A cluster is a collection of data objects that are similar to oneanother within the same cluster and are dissimilar to the objects inother clusters.

Clustering is applicable in many fields, including machinelearning, pattern recognition, image analysis, informationretrieval, and bioinformatics.

Clustering is an example for un supervised learning in MachineLearning

Cluster Analysis can be achieved by various algorithms

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 41: NLP& Bigdata. Motivation and Action

Document Clustering

Definition

The process of grouping a set of physical or abstract objects intoclasses of similar objects is called clustering.A cluster is a collection of data objects that are similar to oneanother within the same cluster and are dissimilar to the objects inother clusters.

Clustering is applicable in many fields, including machinelearning, pattern recognition, image analysis, informationretrieval, and bioinformatics.

Clustering is an example for un supervised learning in MachineLearning

Cluster Analysis can be achieved by various algorithms

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 42: NLP& Bigdata. Motivation and Action

The Library Example

Reference

I found this example in the book Mahout In Action by Sean Owen,Robin Anil, Ted Dunning, and Ellen Friedman

Inside the Library

A Library having thousands of books

There is no particular order or anything how books arearranged in this Library

Brainstorm !

Will you enjoy finding a book you want from there ?

If not give me some solutions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 43: NLP& Bigdata. Motivation and Action

The Library Example

Reference

I found this example in the book Mahout In Action by Sean Owen,Robin Anil, Ted Dunning, and Ellen Friedman

Inside the Library

A Library having thousands of books

There is no particular order or anything how books arearranged in this Library

Brainstorm !

Will you enjoy finding a book you want from there ?

If not give me some solutions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 44: NLP& Bigdata. Motivation and Action

The Library Example

Reference

I found this example in the book Mahout In Action by Sean Owen,Robin Anil, Ted Dunning, and Ellen Friedman

Inside the Library

A Library having thousands of books

There is no particular order or anything how books arearranged in this Library

Brainstorm !

Will you enjoy finding a book you want from there ?

If not give me some solutions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 45: NLP& Bigdata. Motivation and Action

The Library Example

Reference

I found this example in the book Mahout In Action by Sean Owen,Robin Anil, Ted Dunning, and Ellen Friedman

Inside the Library

A Library having thousands of books

There is no particular order or anything how books arearranged in this Library

Brainstorm !

Will you enjoy finding a book you want from there ?

If not give me some solutions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 46: NLP& Bigdata. Motivation and Action

The Library Example

Reference

I found this example in the book Mahout In Action by Sean Owen,Robin Anil, Ted Dunning, and Ellen Friedman

Inside the Library

A Library having thousands of books

There is no particular order or anything how books arearranged in this Library

Brainstorm !

Will you enjoy finding a book you want from there ?

If not give me some solutions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 47: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 48: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 49: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 50: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 51: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 52: NLP& Bigdata. Motivation and Action

Solutions

What about Sorting the books alphabetically by Title ?

Yes, for readers seraching a book by title, that will help.

What if some looking for books on some general subject ? Forexample Health

Grouping books by topics will be more useful in this case

But how would you even begin this grouping ?You will start reading books one by one and group them ! GoodWork :-)

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 53: NLP& Bigdata. Motivation and Action

Steps in Clustering

Clustering involves the following

An algorithm, the method used to group the books together.

A notion of both similarity and dissimilarity.In the library example we relied on our assessment of whichbooks belonged in an existing stack and which should start anew one.

A stopping condition.In the library example, this might have been the point beyondbooks can’t be stacked anymore, or when the stacks arealready quite dissimilar.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 54: NLP& Bigdata. Motivation and Action

Steps in Clustering

Clustering involves the following

An algorithm, the method used to group the books together.

A notion of both similarity and dissimilarity.In the library example we relied on our assessment of whichbooks belonged in an existing stack and which should start anew one.

A stopping condition.In the library example, this might have been the point beyondbooks can’t be stacked anymore, or when the stacks arealready quite dissimilar.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 55: NLP& Bigdata. Motivation and Action

Steps in Clustering

Clustering involves the following

An algorithm, the method used to group the books together.

A notion of both similarity and dissimilarity.In the library example we relied on our assessment of whichbooks belonged in an existing stack and which should start anew one.

A stopping condition.In the library example, this might have been the point beyondbooks can’t be stacked anymore, or when the stacks arealready quite dissimilar.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 56: NLP& Bigdata. Motivation and Action

Steps in Clustering

Clustering involves the following

An algorithm, the method used to group the books together.

A notion of both similarity and dissimilarity.In the library example we relied on our assessment of whichbooks belonged in an existing stack and which should start anew one.

A stopping condition.In the library example, this might have been the point beyondbooks can’t be stacked anymore, or when the stacks arealready quite dissimilar.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 57: NLP& Bigdata. Motivation and Action

K-Means Algorithm

Let’s see an Algorithm first and after that how to automate thegrouping of books in the Library Example.

K-Means

k-Means clustering aims to partition n observations into kclusters.

Takes the input parameter, k, and partitions a set of n objectsinto k clusters so that the resulting intracluster similarity ishigh but the intercluster similarity is low.

Cluster similarity is measured in regard to the mean value ofthe objects in a cluster, which can be viewed as the cluster’scentroid

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 58: NLP& Bigdata. Motivation and Action

K-Means Algorithm

Let’s see an Algorithm first and after that how to automate thegrouping of books in the Library Example.

K-Means

k-Means clustering aims to partition n observations into kclusters.

Takes the input parameter, k, and partitions a set of n objectsinto k clusters so that the resulting intracluster similarity ishigh but the intercluster similarity is low.

Cluster similarity is measured in regard to the mean value ofthe objects in a cluster, which can be viewed as the cluster’scentroid

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 59: NLP& Bigdata. Motivation and Action

K-Means Example

2Reference Teknomo, Kardi. K-Means Clustering Tutorials.http://people.revoledu.com/kardi/tutorial/kMean

Data

Object Attribute 1 (X) weight index Attribute 2 (Y) pHMedicine A 1 1Medicine B 2 1medicine C 4 3Medicine D 5 4

Problem

we have 4 objects each having 2 attributes

we also know before hand that these objects belong to twogroups of medicine (cluster 1 and cluster 2)

The problem now is to determine which medicines belong tocluster 1 and which medicines belong to the other cluster

2 Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 60: NLP& Bigdata. Motivation and Action

K-Means Example

2Reference Teknomo, Kardi. K-Means Clustering Tutorials.http://people.revoledu.com/kardi/tutorial/kMean

Data

Object Attribute 1 (X) weight index Attribute 2 (Y) pHMedicine A 1 1Medicine B 2 1medicine C 4 3Medicine D 5 4

Problem

we have 4 objects each having 2 attributes

we also know before hand that these objects belong to twogroups of medicine (cluster 1 and cluster 2)

The problem now is to determine which medicines belong tocluster 1 and which medicines belong to the other cluster

2 Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 61: NLP& Bigdata. Motivation and Action

Steps in K-means

Iterate until stable (ie no object move group):

1 Determine the centroid coordinate

2 Determine the distance of each object to the centroids

3 Group the object based on minimum distance (find the closestcentroid)

Each medicine represents one point with two features (X, Y). Wecan represent it as coordinate in a feature space

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 62: NLP& Bigdata. Motivation and Action

Steps in K-means

Iterate until stable (ie no object move group):

1 Determine the centroid coordinate

2 Determine the distance of each object to the centroids

3 Group the object based on minimum distance (find the closestcentroid)

Each medicine represents one point with two features (X, Y). Wecan represent it as coordinate in a feature space

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 63: NLP& Bigdata. Motivation and Action

Euclidean distance

Each clustering problem is basically based on a distancebetween points

Euclidean Distance is most commonly usd distance measure

Mathematically, Euclidean distance between points withcoordinates (x1, y1) and (x2, y2) is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 64: NLP& Bigdata. Motivation and Action

Iteration 0

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 65: NLP& Bigdata. Motivation and Action

Iteration 0

Initial Value of Centroids

Take medicine A and medicine B as the first centroids.

Let c1 and c 2 denote the coordinate of the centroids, thenc1 = (1,1) and c 2 = (2,1)

Objects-Centroids Distance

Calculate the distance between cluster centroid to each object.

Distance matrix using Euclidean Distance at iteration 0 is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 66: NLP& Bigdata. Motivation and Action

Iteration 0

Initial Value of Centroids

Take medicine A and medicine B as the first centroids.

Let c1 and c 2 denote the coordinate of the centroids, thenc1 = (1,1) and c 2 = (2,1)

Objects-Centroids Distance

Calculate the distance between cluster centroid to each object.

Distance matrix using Euclidean Distance at iteration 0 is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 67: NLP& Bigdata. Motivation and Action

Iteration 0

Each column in the distance matrix symbolizes the object

The first row of the distance matrix corresponds to thedistance of each object to the first centroid and the secondrow is the distance of each object to the second centroid

For example, distance from medicine C = (4, 3) to the firstcentroid c1 = (1,1) is

Similarly distance to the second centroid c 2 = (2,1) is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 68: NLP& Bigdata. Motivation and Action

Iteration 0

Objects clustering

We assign each object based on the minimum distance

Thus, medicine A is assigned to group 1, medicine B to group2 and so on

Group Matrix

The element of Group matrix below is 1 if and only if theobject is assigned to that group.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 69: NLP& Bigdata. Motivation and Action

Iteration 0

Objects clustering

We assign each object based on the minimum distance

Thus, medicine A is assigned to group 1, medicine B to group2 and so on

Group Matrix

The element of Group matrix below is 1 if and only if theobject is assigned to that group.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 70: NLP& Bigdata. Motivation and Action

Iteration 1

Determine new centroids

Compute the new centroid of each group based on the newmembers

Group 1 only has one memberthus the centroid remains as c1 = (1,1)

Group 2 now has three members, thus the centroid is theaverage coordinate among the three members

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 71: NLP& Bigdata. Motivation and Action

Iteration 1

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 72: NLP& Bigdata. Motivation and Action

Iteration 1

Objects-Centroids Distance

Compute the distance of all objects to the new centroids

Distance matrix at iteration 1 is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 73: NLP& Bigdata. Motivation and Action

Iteration 1

Objects clustering

Again we assign each object based on the minimum distance

Based on the new distance matrix, we move the medicine Bto Group 1 while all the other objects remain.

Group Matrix

Group matrix at Iteration 1

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 74: NLP& Bigdata. Motivation and Action

Iteration 1

Objects clustering

Again we assign each object based on the minimum distance

Based on the new distance matrix, we move the medicine Bto Group 1 while all the other objects remain.

Group Matrix

Group matrix at Iteration 1

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 75: NLP& Bigdata. Motivation and Action

Iteration 2

Determine new centroids

Compute the new centroid of each group based on the newmembers

Group1 and group 2 both has two members, thus the thus thenew centroids are

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 76: NLP& Bigdata. Motivation and Action

Iteration 2

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 77: NLP& Bigdata. Motivation and Action

Iteration 2

Objects-Centroids Distance

Distance matrix at iteration 2 is

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 78: NLP& Bigdata. Motivation and Action

Iteration 2

Objects clustering

Again we assign each object based on the minimum distance

Group Matrix

Group matrix at Iteration 2

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 79: NLP& Bigdata. Motivation and Action

Iteration 2

Objects clustering

Again we assign each object based on the minimum distance

Group Matrix

Group matrix at Iteration 2

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 80: NLP& Bigdata. Motivation and Action

Results

We obtain result that G2 = G1.

Comparing the grouping of last iteration and this iterationreveals that the objects does not move group anymore.

Thus, the computation of the k-mean clustering has reachedits stability and no more iteration is needed.

We get the final grouping as the results.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 81: NLP& Bigdata. Motivation and Action

Document Representations

X-Y Plane Example

In previous example the measure of similarity (or similaritymetric) for the points was the Euclidean distance between twopoints

And that was in the X-Y plane

Library Example

The library example had no such clear, mathematical measure.

And we relied entirely on our wisdom to judge book similarity

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 82: NLP& Bigdata. Motivation and Action

Document Representations

X-Y Plane Example

In previous example the measure of similarity (or similaritymetric) for the points was the Euclidean distance between twopoints

And that was in the X-Y plane

Library Example

The library example had no such clear, mathematical measure.

And we relied entirely on our wisdom to judge book similarity

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 83: NLP& Bigdata. Motivation and Action

Document Representations

Brainstorm !

We need a metric that can be implemented on a computer.

One possible metric could be based on the number of wordscommon to two books’ titles.

So “Harry Potter: The Philosopher’s Stone” and “HarryPotter: The Prisoner of Azkaban” have three words incommon: “Harry”, “Potter” and “The”.

But, even though the book “The Lord of the Rings: The TwoTowers” is similar to the Harry Potter series, this measure ofsimilarity doesn’t capture that.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 84: NLP& Bigdata. Motivation and Action

Document Representations

Brainstorm !

We need a metric that can be implemented on a computer.

One possible metric could be based on the number of wordscommon to two books’ titles.

So “Harry Potter: The Philosopher’s Stone” and “HarryPotter: The Prisoner of Azkaban” have three words incommon: “Harry”, “Potter” and “The”.

But, even though the book “The Lord of the Rings: The TwoTowers” is similar to the Harry Potter series, this measure ofsimilarity doesn’t capture that.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 85: NLP& Bigdata. Motivation and Action

Document Representations

Brainstorm !

We need a metric that can be implemented on a computer.

One possible metric could be based on the number of wordscommon to two books’ titles.

So “Harry Potter: The Philosopher’s Stone” and “HarryPotter: The Prisoner of Azkaban” have three words incommon: “Harry”, “Potter” and “The”.

But, even though the book “The Lord of the Rings: The TwoTowers” is similar to the Harry Potter series, this measure ofsimilarity doesn’t capture that.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 86: NLP& Bigdata. Motivation and Action

Document Representations

Brainstorm !

We need a metric that can be implemented on a computer.

One possible metric could be based on the number of wordscommon to two books’ titles.

So “Harry Potter: The Philosopher’s Stone” and “HarryPotter: The Prisoner of Azkaban” have three words incommon: “Harry”, “Potter” and “The”.

But, even though the book “The Lord of the Rings: The TwoTowers” is similar to the Harry Potter series, this measure ofsimilarity doesn’t capture that.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 87: NLP& Bigdata. Motivation and Action

Document Representations

Brainstorm !

We need a metric that can be implemented on a computer.

One possible metric could be based on the number of wordscommon to two books’ titles.

So “Harry Potter: The Philosopher’s Stone” and “HarryPotter: The Prisoner of Azkaban” have three words incommon: “Harry”, “Potter” and “The”.

But, even though the book “The Lord of the Rings: The TwoTowers” is similar to the Harry Potter series, this measure ofsimilarity doesn’t capture that.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 88: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 89: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 90: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 91: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 92: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 93: NLP& Bigdata. Motivation and Action

Document Representations

Another Solutions

We could assemble word counts for each book, and when thecounts are close for many words, judge the books similar.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

We could use numeric weights in the computation, and applylow weights to these words to reduce their effect on thesimilarity value.

Once we give a weight value to each word in a book, we caneasily find out the similarity of two books.

But the words like “a”, “an”, and “the” cannot contributemuch to the similarity, because they occurs frequently in bothbooks.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 94: NLP& Bigdata. Motivation and Action

Document Representations

What if one book is 300 pages long and the other 1000 pageslong?

We have to ensure that the weight of words should be relativeto the length of the text.

We will see a method called TF-IDF shortly

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 95: NLP& Bigdata. Motivation and Action

Document Representations

What if one book is 300 pages long and the other 1000 pageslong?

We have to ensure that the weight of words should be relativeto the length of the text.

We will see a method called TF-IDF shortly

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 96: NLP& Bigdata. Motivation and Action

Document Representations

What if one book is 300 pages long and the other 1000 pageslong?

We have to ensure that the weight of words should be relativeto the length of the text.

We will see a method called TF-IDF shortly

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 97: NLP& Bigdata. Motivation and Action

Document Representations

Task !

Explore following distance measures

1 Squared Euclidean distance measure

2 Manhattan distance measure

3 Cosine distance measure

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 98: NLP& Bigdata. Motivation and Action

Document Representations

Representing Data as Vectors

In mathematics, a vector is simply a point in space.

We found how books can be clustered together based on theirsimilarity in words.

In reality, clustering could be applied to any kind of objectprovided we can distinguish similar and dissimilar items.

Clustering of anything via algorithms starts with representingthe object in a way that can be read by computers.

It is quite practical to think of objects in terms of theirmeasurable features or attributes.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 99: NLP& Bigdata. Motivation and Action

Document Representations

Say we want to cluster bunch of Apples 3

3Figure taken from Mahout in ActionSarath P R [email protected] NLP & Bigdata Motivation and Action

Page 100: NLP& Bigdata. Motivation and Action

Document Representations

A small, round, red apple is more similar to a small, round,green one than a large, ovoid green one.

The process of vectorization starts with assigning features to adimension

Let’s say weight is feature (dimension) 0, color is 1, and size is2

So the vector of a small round red apple looks like [0: 100gram, 1: red, 2: small]

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 101: NLP& Bigdata. Motivation and Action

Document Representations

A small, round, red apple is more similar to a small, round,green one than a large, ovoid green one.

The process of vectorization starts with assigning features to adimension

Let’s say weight is feature (dimension) 0, color is 1, and size is2

So the vector of a small round red apple looks like [0: 100gram, 1: red, 2: small]

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 102: NLP& Bigdata. Motivation and Action

Document Representations

A small, round, red apple is more similar to a small, round,green one than a large, ovoid green one.

The process of vectorization starts with assigning features to adimension

Let’s say weight is feature (dimension) 0, color is 1, and size is2

So the vector of a small round red apple looks like [0: 100gram, 1: red, 2: small]

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 103: NLP& Bigdata. Motivation and Action

Document Representations

A small, round, red apple is more similar to a small, round,green one than a large, ovoid green one.

The process of vectorization starts with assigning features to adimension

Let’s say weight is feature (dimension) 0, color is 1, and size is2

So the vector of a small round red apple looks like [0: 100gram, 1: red, 2: small]

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 104: NLP& Bigdata. Motivation and Action

Document Representations

Set of apples of different weight, sizes and colors converted tovectors 4

4Figure taken from Mahout in ActionSarath P R [email protected] NLP & Bigdata Motivation and Action

Page 105: NLP& Bigdata. Motivation and Action

Document Representations

Improving weighting with TF-IDF

Term frequency - Inverse Document Frequency (TF-IDF)weighting is a widely used improvement on simple termfrequency weighting.

We found how books can be clustered together based on theirsimilarity in words.

Instead of simply using term frequency as values in the vector,this value is multiplied by the inverse of the term’s documentfrequency

IDF=log(N/n)N=total number of documentsn = number of documents that contain a termTF-IDF = TF*IDF

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 106: NLP& Bigdata. Motivation and Action

Stanford NLP

NLP Toolkit

Stanford NLP group provides NLP toolkits for various majorcomputational linguistics problems.

Written in Java.

Open Source

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 107: NLP& Bigdata. Motivation and Action

Stanford NLP

Stanford Named Entity Recognizer

Named-entity recognition (NER) techniques locate andclassify atomic elements in text into predefined categoriessuch as the names of persons, organizations, locations etc

Consider the following text

Hello Jona, I am in Indian Institute at Trivandrum

What are the entities in this ?

NER Demo

Stanford NER is also known as CRFClassifierConditional Random Field (CRF) sequence models are used forstructured predictions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 108: NLP& Bigdata. Motivation and Action

Stanford NLP

Stanford Named Entity Recognizer

Named-entity recognition (NER) techniques locate andclassify atomic elements in text into predefined categoriessuch as the names of persons, organizations, locations etc

Consider the following text

Hello Jona, I am in Indian Institute at Trivandrum

What are the entities in this ?

NER Demo

Stanford NER is also known as CRFClassifierConditional Random Field (CRF) sequence models are used forstructured predictions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 109: NLP& Bigdata. Motivation and Action

Stanford NLP

Stanford Named Entity Recognizer

Named-entity recognition (NER) techniques locate andclassify atomic elements in text into predefined categoriessuch as the names of persons, organizations, locations etc

Consider the following text

Hello Jona, I am in Indian Institute at Trivandrum

What are the entities in this ?

NER Demo

Stanford NER is also known as CRFClassifierConditional Random Field (CRF) sequence models are used forstructured predictions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 110: NLP& Bigdata. Motivation and Action

Stanford NLP

Stanford Named Entity Recognizer

Named-entity recognition (NER) techniques locate andclassify atomic elements in text into predefined categoriessuch as the names of persons, organizations, locations etc

Consider the following text

Hello Jona, I am in Indian Institute at Trivandrum

What are the entities in this ?

NER Demo

Stanford NER is also known as CRFClassifierConditional Random Field (CRF) sequence models are used forstructured predictions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 111: NLP& Bigdata. Motivation and Action

Stanford NLP

Stanford Named Entity Recognizer

Named-entity recognition (NER) techniques locate andclassify atomic elements in text into predefined categoriessuch as the names of persons, organizations, locations etc

Consider the following text

Hello Jona, I am in Indian Institute at Trivandrum

What are the entities in this ?

NER Demo

Stanford NER is also known as CRFClassifierConditional Random Field (CRF) sequence models are used forstructured predictions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 112: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Twitter

Twitter Streaming Demo

Sentiment Analysis

Sentiment analysis is one of the hottest research areas incomputer science today.

A basic task in sentiment analysis is to classify the polarity ofa given text at the document, sentence, or aspect level.

Whether the expressed opinion in a document, a sentence oran entity feature oraspect is positive, negative, or neutral.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 113: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Twitter

Twitter Streaming Demo

Sentiment Analysis

Sentiment analysis is one of the hottest research areas incomputer science today.

A basic task in sentiment analysis is to classify the polarity ofa given text at the document, sentence, or aspect level.

Whether the expressed opinion in a document, a sentence oran entity feature oraspect is positive, negative, or neutral.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 114: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Movie Review

Let’s see a tweet on a recently released movie

“Wow #Krish3 looks more exciting than Superman nSpider-Man for sure ! The Roshans have made a truly worldclass super hero film, again!”

These snippets of text are a gold mine for companies andindividuals that want to monitor their reputation and gettimely feedback about their products and actions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 115: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Movie Review

Let’s see a tweet on a recently released movie

“Wow #Krish3 looks more exciting than Superman nSpider-Man for sure ! The Roshans have made a truly worldclass super hero film, again!”

These snippets of text are a gold mine for companies andindividuals that want to monitor their reputation and gettimely feedback about their products and actions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 116: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Movie Review

Let’s see a tweet on a recently released movie

“Wow #Krish3 looks more exciting than Superman nSpider-Man for sure ! The Roshans have made a truly worldclass super hero film, again!”

These snippets of text are a gold mine for companies andindividuals that want to monitor their reputation and gettimely feedback about their products and actions

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 117: NLP& Bigdata. Motivation and Action

Social Media and Sentiment Analysis

Document-Level Sentiment Analysis

Main approach for document level sentiment analysis issupervised learning.

The system learns a classification model from the training data

common classification algorithms such as SVM, Naive Bayes,Logistic Regression etc can be used

Thus new documents are tagged into their various sentimentclasses

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 118: NLP& Bigdata. Motivation and Action

Bigdata

Introduction to Bigdata

Big data is the term for a collection of data sets so large andcomplex that it becomes difficult to process using on-handdatabase management tools or traditional data processingapplications.The challenges include capture, curation, storage, search, sharing,transfer, analysis, and visualization.

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 119: NLP& Bigdata. Motivation and Action

Bigdata

3 Vs of Bigdata

Volume: Ever-growing data of all types

Velocity: For time-sensitive processes such as catching fraud,intrusion detection etc, the speed at which data arrives is acharacteristic of bigdata

Variety: Any type of data, structured and unstructured datasuch as text, sensor data, audio, video, click streams, log filesand more

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 120: NLP& Bigdata. Motivation and Action

Bigdata

Tools and Technologies

Hadoop

NoSQL

Spark

D3

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 121: NLP& Bigdata. Motivation and Action

Bigdata

Few Interesting Areas

Internet of Things

Data Journalism

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 122: NLP& Bigdata. Motivation and Action

Conclusion

Questions ?

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 123: NLP& Bigdata. Motivation and Action

References

Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman, Mahout in Action,Manning Publications

Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques

Teknomo, Kardi K-Means Clustering Tutorials

A first take at building an inverted index,http://nlp.stanford.edu/IR-book/html/htmledition/

a-first-take-at-building-an-inverted-index-1.html

Sarath P R [email protected] NLP & Bigdata Motivation and Action

Page 124: NLP& Bigdata. Motivation and Action

Thanks

Sarath P R [email protected] NLP & Bigdata Motivation and Action