I Can Do Text Analytics! Designing Development Tools for Novice Developers

33
I Can Do Text Analytics! Designing Development Tools for Novice Developers Huahai Yang* Daina Pupons-Wickham** Laura Chiticariu* Yunyao Li* Benjamin Nguyen** Arnaldo Carreno-fuentes* *IBM Research - Almaden **IBM Software - Silicon Valley

description

Talk given at CHI 2013 in Paris http://chi2013.acm.org/program/by-day/tuesday/#SND.

Transcript of I Can Do Text Analytics! Designing Development Tools for Novice Developers

Page 1: I Can Do Text Analytics! Designing Development Tools for Novice Developers

I Can Do Text Analytics!Designing Development Tools for Novice

Developers

Huahai Yang* Daina Pupons-Wickham** Laura Chiticariu*Yunyao Li* Benjamin Nguyen** Arnaldo Carreno-fuentes*

*IBM Research - Almaden **IBM Software - Silicon Valley

Page 2: I Can Do Text Analytics! Designing Development Tools for Novice Developers

OUTLINE

• Problem motivation– Text analytics– User population and needs

• Formative design iterations– Expert interviews– User studies in lab and field

• Current design and evaluation– Workflow Guide and Extraction Plan– Evaluation by competition

Page 3: I Can Do Text Analytics! Designing Development Tools for Novice Developers

TEXT ANALYTICS

Public Text

Web Text

Private Text

TextAnalytics

MarketingFinancial investmentDrug discoveryLaw enforcement…

Applications

Social media

News

SEC

InternalData

SubscriptionData

USPTO

Page 4: I Can Do Text Analytics! Designing Development Tools for Novice Developers

HIDDEN VALUES IN TEXT

Page 5: I Can Do Text Analytics! Designing Development Tools for Novice Developers

DREAM

Page 6: I Can Do Text Analytics! Designing Development Tools for Novice Developers

REALITY

Page 7: I Can Do Text Analytics! Designing Development Tools for Novice Developers

TEXT ANALYTICS IS HARD

Page 8: I Can Do Text Analytics! Designing Development Tools for Novice Developers

ML DOES NOT SAVE THE DAY

Wagstaff, K. Machine Learning that Matters. In ICML (2012)

Page 9: I Can Do Text Analytics! Designing Development Tools for Novice Developers

ANNOTATION QUERY LANGUAGE (AQL)

• A declarative language for developing text analytics extractors [Chiticariu et al., 2010]

• Very expressive• Runs very fast

Page 10: I Can Do Text Analytics! Designing Development Tools for Novice Developers

SIMPLE EXAMPLE: OPINION ON A MOVIE

Movie

Mission Impossible has an entertaining plot, but terrible acting.

Input

Opinion

(Movie Name, Aspect, Opinion)

(Mission Impossible, plot, positive)

(Mission Impossible, acting, negative)

Desired Output

Aspect Opinion Aspect

Page 11: I Can Do Text Analytics! Designing Development Tools for Novice Developers

SAMPLE AQL FOR OPINION ON A MOBILE

<Movie> <Opinion>

0-15 tokens

create view MovieReviewSnippet asselect M.name as name, O.value as value, A.aspect as aspect CombineSpans(M.name,A.aspect) as reviewfrom Movie M, Opinion O, Aspect Awhere FollowsTok(M.name, O.value, 0, 15) and FollowsTok(O.value, A.aspect, 0, 0);

create view Opinion asextract dict ‘opinion.dict’ on D.textfrom Document D;

<Aspect>

0 token

create view Aspect asextract dict ‘aspect.dict’ on D.textfrom Document D;

Page 12: I Can Do Text Analytics! Designing Development Tools for Novice Developers

SKILLED PROGRAMMER, BUT NOVICE DEVELOPER IN TEXT ANALYTICS

Page 14: I Can Do Text Analytics! Designing Development Tools for Novice Developers

CAN NOVICE DEVELOPER BE PRODUCTIVE?

EXPIRED

Page 15: I Can Do Text Analytics! Designing Development Tools for Novice Developers

WHAT IS MISSING HERE?

Page 16: I Can Do Text Analytics! Designing Development Tools for Novice Developers

BRING TEXT BACK TO TEXT ANALYTICS

Page 17: I Can Do Text Analytics! Designing Development Tools for Novice Developers

WHAT EXPERT DEVELOPERS KNOW?

Page 18: I Can Do Text Analytics! Designing Development Tools for Novice Developers

WHAT EXPERT DEVELOPERS KNOW?

Page 19: I Can Do Text Analytics! Designing Development Tools for Novice Developers

WHAT EXPERT DEVELOPERS KNOW?

We designed tools to embody the best practice

Page 20: I Can Do Text Analytics! Designing Development Tools for Novice Developers

FORMATIVE LAB STUDY

• 14 novice developers• First given a tutorial on AQL• Task: extract revenue by divisions from

company annual report• Without tool, none complete the task• With tool, all completed within 90 minute

Page 21: I Can Do Text Analytics! Designing Development Tools for Novice Developers

FORMATIVE FIELD STUDY

• 12 week, 10 project members, 4 doing text analytics (4 or 5 hours per week)

• Built profiles for pharmaceutical companies• Interviews– Participants reported that the tool was easy to use– Participants made many suggestions for UI

enhancement

Page 22: I Can Do Text Analytics! Designing Development Tools for Novice Developers

MAIN FEATURE: WORKFLOW GUIDE

Page 23: I Can Do Text Analytics! Designing Development Tools for Novice Developers

MAIN FEATURE: EXTRACTION PLAN

Page 24: I Can Do Text Analytics! Designing Development Tools for Novice Developers

CODE TEMPLATE FROM EXTRACTION PLAN

Page 25: I Can Do Text Analytics! Designing Development Tools for Novice Developers

EVALUATION BY COMPETITION

• Task: buzz identification - identifying tweets mentioning the top 10 Billboard songs in the week of May 5, 2012

• Participants: summer interns, 6 registered, 4 submitted answers

• Price: $500 for the winner

• Setup: – Participants were given labeled training data (159 tweets)– Participants wrote extractors independently with our tool– Extractor quality measured on unseen test data (100 tweets)

Pre-competition Briefing

Page 26: I Can Do Text Analytics! Designing Development Tools for Novice Developers

TASK HARDER THAN IT LOOKS

• RT @ardanradio #NowPlaying FUN feat Janelle Monae - We Are Young | #RIAUW

• RT @arieladriane: @1DirectionIndo what makes you beautiful - one direction cover by glee. http://t.co/t4BmvZbM

• @Cimorelliband @LisaCim @LaurenCimorelli payphone was amazing can you guys please do we are young by fun!! Thanks.

• RT @Jadore1Dx: Dear Mothers &amp; fathers of 1D - as The Wanted would say, im glad you came.

• RT @Melisaaa11: My boyfriend knows hes jealous of my relationship with Justin Bieber

• Now u just somebody that I used to know!

• RT @ardanradio #NowPlaying FUN feat Janelle Monae - We Are Young | #RIAUW

• RT @arieladriane: @1DirectionIndo what makes you beautiful - one direction cover by glee. http://t.co/t4BmvZbM

• @Cimorelliband @LisaCim @LaurenCimorelli payphone was amazing can you guys please do we are young by fun!! Thanks.

• RT @Jadore1Dx: Dear Mothers &amp; fathers of 1D - as The Wanted would say, im glad you came.

• RT @Melisaaa11: My boyfriend knows hes jealous of my relationship with Justin Bieber

• Now u just somebody that I used to know!

Page 27: I Can Do Text Analytics! Designing Development Tools for Novice Developers

PERFORMANCE MEASURE

• Precision– Proportion of identified buzz that are real:

• Recall– Proportion of real buzz identified:

• F1– Combining precision and recall:

All test tweets

Tweets identified as

buzz

Real buzz

Page 28: I Can Do Text Analytics! Designing Development Tools for Novice Developers

EVALUATION RESULTS

• State of the art F1 is around 80% for similar tasks [Ritter et al. EMNLP’11; Liu et al. ACL’12]

Page 29: I Can Do Text Analytics! Designing Development Tools for Novice Developers

INTERVIEW

• Interviewed before announcing winners• All worked only the day before deadline• The winner worked only 5 hours

“Because the process is very clear, the wizard is very easy to follow”

“is quite helpful to analyze the sample data and define basic concepts. I used it extensively to create my dictionaries”

“I did not face any problems using the tool”

Page 30: I Can Do Text Analytics! Designing Development Tools for Novice Developers

LOWER BARRIER TO COMPLEX DOMAIN

Page 31: I Can Do Text Analytics! Designing Development Tools for Novice Developers

CONTRIBUTIONS

• Summarized the best practice of text analytics via expert interviews

• Built UI features to support the text analytics best practice

• Lowered barrier and raised ceiling for text analytics

Page 32: I Can Do Text Analytics! Designing Development Tools for Novice Developers

FUTURE WORK

• Enable non-programmers to build text extractors with similar power as AQL

• Collaborative text analytics

Page 33: I Can Do Text Analytics! Designing Development Tools for Novice Developers

Q & A

More Info

Huahai Yang IBM Research - Almaden

[email protected]

• IBM InfoSphere BigInsights Text Analytics YouTube videos: http://bit.ly/10pfDgY

• Online classes: http://BigDataUniversity.com