SEIZE THE DATA. 2015 - h41382. · SEIZE THE DATA. 2015 HP Vertica Pulse Key Features (1 of 2)...

Post on 07-Jun-2020

0 views 0 download

Transcript of SEIZE THE DATA. 2015 - h41382. · SEIZE THE DATA. 2015 HP Vertica Pulse Key Features (1 of 2)...

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.1 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015

SEIZE THE DATA. 2015

Sentiment Analysis Using HP Vertica PulseHerb Collins, Vertica Training

August 10, 2015

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3 SEIZE THE DATA. 2015

About Sentiment Analysis

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4 SEIZE THE DATA. 2015

What Sentiment Analysis Is (1 of 2)

Refers to a broad area of natural language processing, computational linguistics and text mining.

Aims to determine the attitude of a communicator with respect to some topic.

• Communication could initially be in either text or in spoken word.

The attitude may be:

• Judgment or evaluation of something or someone.

• Emotional state when writing or speaking.

• Emotional effect the communicator wishes to have on the reader.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5 SEIZE THE DATA. 2015

What Sentiment Analysis Is (2 of 2)

Usually automated and tunable to meet specific objectives of the analyst.

• Using dictionaries or other configuration tools.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6 SEIZE THE DATA. 2015

Why You Should Care (1 of 2)

Fast-growing mountain of data that can reveal the collective consciousness of your clients and users.

A kind of virtual currency that can make or break a product in the marketplace.

• Canary in a coal mine.

Can identify most influential opinion holders.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7 SEIZE THE DATA. 2015

Why You Should Care (2 of 2)

Can pinpoint the effect of specific issues on customer perceptions.• Helps to respond with appropriate marketing and public relations strategies.

It pays to listen to your consumers. • Your business may lose its loyal consumers to competitors who are willing to address their concerns..

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 SEIZE THE DATA. 2015

Tuning For Your Context

“Your new sales process is narly. It makes purchasing products and services so groovy.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9 SEIZE THE DATA. 2015

Tuning For Your Context

“The interface in the new XYZ upgrade is a kludge.”

“Your new sales process is narly. It makes purchasing products and services so groovy.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10 SEIZE THE DATA. 2015

Tuning For Your Context

“The interface in the new XYZ upgrade is a kludge.”

“Your customer care representatives are dynamite!”

“Your new sales process is narly. It makes purchasing products and services so groovy.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11 SEIZE THE DATA. 2015

Tuning For Your Context

“The interface in the new XYZ upgrade is a kludge.”

“Your new sales process is narly. It makes purchasing products and services so groovy.

“The installation instructions make me dizzy!”

“Your customer care representatives are dynamite!”

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12 SEIZE THE DATA. 2015

Some Sources Of Sentiment Data

Social media

Business communications

• Twitter • Yammer

• Facebook • Blogs

• LinkedIn • Smartphone Apps

• Textual help desk chats • Emails

• Recorded audio (converted to text using IDOL)

• Open-text responses to surveys

• Instant messaging • Comment fields in applications

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13 SEIZE THE DATA. 2015

How HP Vertica Pulse Can Help

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14 SEIZE THE DATA. 2015

What HP Pulse Is

Provides a suite of functions that allow you to analyze and extract the sentiment from English and Spanish language text directly from your HP Vertica database.

Can analyze each text row (for example a tweet) in the language of the text specified as argument, the language specified by the user as parameter or the default language.

A “Pulse Cookbook” is provided to help you get the most out of sentiment analysis.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15 SEIZE THE DATA. 2015

HP Vertica Pulse Key Features (1 of 2)

Scoring of the sentiment of attributes in a sentence.

• Typically scores sentiment from a range of -1 (negative sentiment) to +1.

• A sentiment of 0 is considered neutral.

Tunable to your environment.

• Example: You can analyze comments containing only the name of your product or company.

Can configure how sentiment is scored.

• Includes user-modifiable dictionaries of words that are used to help score sentiment.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16 SEIZE THE DATA. 2015

HP Vertica Pulse Key Features (2 of 2)

Can filter out attributes you are not interested in.

• Pulse supports a special user-dictionary to indicate attributes that should not be analyzed.

• Alternately, you can choose to score sentiment only on specified attributes.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17 SEIZE THE DATA. 2015

Pulse Dictionaries (1 of 2)

white_list_en

• Words that are always marked as an attribute. This list augments the built-in Pulse attribute discovery process. Add words that you always want scored to the white_list user dictionary.

stop_words_en

• Words that are never marked as an attribute. Add words that you do not want scored to the stop_words user dictionary.

pos_words_en

• Positive words that can be any type of word or phrase. Words in this list are more likely to carry a positive polarity in general.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18 SEIZE THE DATA. 2015

Pulse Dictionaries (2 of 2)

neg_words_en

• Negative words that can be any type of word or phrase that have a negative connotation. Words in this list are deemed more likely to carry a negative polarity in general.

neutral_words_en

• Words that indicate a neutral connotation. Words in this list are scored with a sentiment of 0, meaning not positive or negative.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19 SEIZE THE DATA. 2015

The Pulse Cookbook

A special guide provided by HP Vertica that offers guidance on a variety of topics, including:

• Batch Analyzing Data as It Is Loaded

• Analyzing Comments for a Company or Product

• Determining Popular Topics

• Determining Prolific Authors

• Analyzing the Sentiment of Specific Authors

• Finding Associated Attributes

• Using Pulse as an Aid in Competitive Analysis

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20 SEIZE THE DATA. 2015

Some Technical Information

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21 SEIZE THE DATA. 2015

Installation Highlights (1 of 2)

1. You must install a Java Virtual Machine (JVM) on every host in your HP Vertica cluster in order to run Pulse.

• Pulse requires a 64-bit Java Standard Edition 6 or 7 (Java version 1.6 or 1.7) runtime.

• You can choose to install either the Java Runtime Environment (JRE) or Java Development Kit (JDK), since the JDK also includes the JRE.

2. Install the Pulse Package on a single node.

• Install/Update using the separate RPM or DEB package for Pulse.

• Run included sql scripts to install/update the Pulse functions and create the user dictionaries.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22 SEIZE THE DATA. 2015

Installation Highlights (2 of 2)

3. The JavaBinaryForUDx configuration parameter must be set.

• Tells HP Vertica where to look for the JRE to execute Java UDFs.

• You need to set this parameter to the absolute path of the Java executable.

4. You must modify the jvm resource pool so that HP Vertica Pulse has adequate resources to perform sentiment analysis queries.

• If a cluster does not have sufficient resources to run an HP Vertica Pulse query, then such a query can fail with an Out Of Memory (OOM) exception.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23 SEIZE THE DATA. 2015

Schema and Privileges

• Installation script automatically creates a pulse schema.

• Contains the user-dictionary, mapping lists and dictionary update tables used by Pulse.

• Initially only administrators can read or edit tables in the pulse schema.

• Installation script also creates a 'pulse_users' role, which has all privileges for the pulse schema.

• To give non-administrator database users access to the pulse schema, you assign the user to the 'pulse_users' role

GRANT pulse_users TO [username];

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24 SEIZE THE DATA. 2015

Multilingual

• Pulse can analyze text in different languages.

• Currently English and Spanish are supported.

• You can specify the language that is analyzed in three ways:

• Provide the language as argument: if there is a language specified in the document record, then it can be used for analyzing the text by passing it as argument.

• Provide the language as parameter: if there is no value specified for the language for a document record, Pulse uses the value specified for the language parameter in the query to get the language.

• Do not provide an argument or parameter and use the default language. Pulse defaults to English unless you have changed the default language.

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.25 SEIZE THE DATA. 2015

Demonstration

SEIZE THE DATA. 2015QUESTIONS?Please attend our Q&A with HP Big Data experts today

Marina Ballroom, Lobby level

10:15 am • 10:30 am

12:00 pm • 1:00 pm

2:30 pm • 3:00 pm

4:30 pm • 5:00 pm

SEIZE THE DATA. 2015