An Intro to Text Analytics on Big Data with a use case
-
Upload
raul-chong -
Category
Technology
-
view
120 -
download
1
description
Transcript of An Intro to Text Analytics on Big Data with a use case
#TOSMAC
Toronto SMAC Meetup – Welcome!An Intro to Text Analytics on Big Data with a use case
#TOSMAC
Toronto SMAC Team
| © 2014 IBM Corporation2
Lucas Silva Felipe MosquettaMarcos de Mello
#TOSMAC
Twitters numbersAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 3
As you know:
-500 million Tweets are sent per day.
-Twitter supports 35+ languages.
-255 million monthly active users.
Huge amount of data!
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 4
Overview
Section1 Section2 Section3 Section4 Section5
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 5
Overview
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 6
Overview
#TOSMAC
Let’s get started!
| © 2014 IBM Corporation 7
#TOSMAC
Input dataAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 8
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 9
Section2
#TOSMAC
Demo
| © 2014 IBM Corporation 10
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 11
Next section
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 12
Next sectionExtractor: used to extract
structured information from unstructured and
semi-structured data.
AQL: Annotation Query Language. Rule language
with familiar SQL-like syntax.
#TOSMAC
Section1 Section2 Section3 Section4 Section5
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 13
Next section
Profiler:troubleshooting performance
problems.
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 14
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 15
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 16
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 17
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 18
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 19
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 20
Types of extraction specifications:
- Dictionaries
-Regular expressions
- Part of speech numbers:7.54
13
#TOSMAC
Demo
| © 2014 IBM Corporation 21
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 22
Types of extraction specifications:
- Dictionaries
- Regular expressions
- Part of speech
#TOSMAC
Main conceptsAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 23
#TOSMAC
An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 24
#TOSMAC
| © 2014 IBM Corporation 25
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Basic feature AQL statements- Develop the core building blocks of the extractor.
#TOSMAC
| © 2014 IBM Corporation 26
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Candidate generation AQL statements- Combine basic features AQL statements.
#TOSMAC
| © 2014 IBM Corporation 27
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million$4 thousand
$ 7.5 million
#TOSMAC
| © 2014 IBM Corporation 28
An Intro to Text Analytics on Big Data with a use case
Candidate generation AQL statements
$7.5 million$4 thousand
$ 7.5 million
$7.5 million
#TOSMAC
| © 2014 IBM Corporation 29
An Intro to Text Analytics on Big Data with a use case
AQL Guidelines
Filter and consolidate AQL statements- Refine results- Remove invalid annotations- Resolve overlap between annotations.
#TOSMAC
Demo
| © 2014 IBM Corporation 30
#TOSMAC
| © 2014 IBM Corporation 31
An Intro to Text Analytics on Big Data with a use case
Conclusion
#TOSMAC
Check pointAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 32
#TOSMAC
What we have doneAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 33
Section1 Section2 Section3
#TOSMAC
What are we going to do?An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 34
Section4 Section5
#TOSMAC
Demo
| © 2014 IBM Corporation 35
#TOSMAC
Also using RAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 36
1.75 0.32
#TOSMAC
What are we going to do?An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 37
#TOSMAC
Demo
| © 2014 IBM Corporation 38
#TOSMAC
So what?An Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 39
#TOSMAC
CompaniesAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 40
#TOSMAC
Exporting to youAn Intro to Text Analytics on Big Data with a use case
| © 2014 IBM Corporation 41
#TOSMAC
Thank you!Let's network!
| © 2014 IBM Corporation 42