An Intro to Text Analytics on Big Data with a use case

42
#TOSMAC Toronto SMAC Meetup – Welcome! An Intro to Text Analytics on Big Data with a use case

description

Introduction on how to perform text analytics using input from twitter and the "Emmys" as use case example.

Transcript of An Intro to Text Analytics on Big Data with a use case

Page 1: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Toronto SMAC Meetup – Welcome!An Intro to Text Analytics on Big Data with a use case

Page 2: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Toronto SMAC Team

| © 2014 IBM Corporation2

Lucas Silva Felipe MosquettaMarcos de Mello

Page 3: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Twitters numbersAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 3

As you know:

-500 million Tweets are sent per day.

-Twitter supports 35+ languages.

-255 million monthly active users.

Huge amount of data!

Page 4: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 4

Overview

Section1 Section2 Section3 Section4 Section5

Page 5: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 5

Overview

Page 6: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 6

Overview

Page 7: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Let’s get started!

| © 2014 IBM Corporation 7

Page 8: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Input dataAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 8

Page 9: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 9

Section2

Page 10: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 10

Page 11: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 11

Next section

Page 12: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 12

Next sectionExtractor: used to extract

structured information from unstructured and

semi-structured data.

AQL: Annotation Query Language. Rule language

with familiar SQL-like syntax.

Page 13: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 13

Next section

Profiler:troubleshooting performance

problems.

Page 14: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 14

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

Page 15: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 15

Page 16: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 16

Page 17: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 17

Page 18: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 18

Page 19: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 19

Page 20: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 20

Types of extraction specifications:

- Dictionaries

-Regular expressions

- Part of speech numbers:7.54

13

Page 21: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 21

Page 22: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 22

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

Page 23: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 23

Page 24: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 24

Page 25: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 25

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Basic feature AQL statements- Develop the core building blocks of the extractor.

Page 26: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 26

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Candidate generation AQL statements- Combine basic features AQL statements.

Page 27: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 27

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

Page 28: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 28

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

$7.5 million

Page 29: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 29

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Filter and consolidate AQL statements- Refine results- Remove invalid annotations- Resolve overlap between annotations.

Page 30: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 30

Page 31: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 31

An Intro to Text Analytics on Big Data with a use case

Conclusion

Page 32: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Check pointAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 32

Page 33: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What we have doneAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 33

Section1 Section2 Section3

Page 34: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 34

Section4 Section5

Page 35: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 35

Page 36: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Also using RAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 36

1.75 0.32

Page 37: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 37

Page 38: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 38

Page 39: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

So what?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 39

Page 40: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

CompaniesAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 40

Page 41: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Exporting to youAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 41

Page 42: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Thank you!Let's network!

| © 2014 IBM Corporation 42