Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

22
ʄreshintegral.co m Challenges for the LT Industry Hashtag Platform: machine- assisted multilingual curation Mathew Lowry, Fresh Integral Communications

Transcript of Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

Page 1: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Challenges for the LT Industry

Hashtag Platform: machine-assisted multilingual curation

Mathew Lowry, Fresh Integral Communications

Page 2: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 2

Outline

• Who, Why & What: – Online Communities– multilingual content curation

• User experience: how would it look• How: Hashtag Engine & Platform

– Engine: machine-assisted multilingual curation– The Platform: crowdsourcing Engine training

Page 3: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 3

THE USERSHashtag Platform

Page 4: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 4

Communities of Interest or Practice

• Informal network: people with common interest– Online and/or Offline

• Apply collective knowledge -> each Member’s problems– Share problems, solutions, ideas & stories– Learn, teach, network, influence, visibility, establish

mindshare & expertise …• “A community that learns”, collectively raising

each others’ knowledge

Page 5: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 5

Can’t find what – or who – you’re looking for?

Page 6: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 6

Useful Knowledge: A Communities’ Killer App

• What: Library of Useful Resources– Extensive (lots of content) – Fine-grained (categorised with detailed taxonomy)– Multiple languages: multilingual taxonomies, translated abstracts,

machine & community translations• How: Created by Community• Why

– Improve community’s Collective Intelligence– Improve Content Discovery ... & People Discovery

• Problem: Vicious circle / Chicken-Egg– nobody’s submitting anything because noone’s here– noone’s here because nobody’s submitting anything

Page 7: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 7

“Extensive AND Fine-Grained” Resource Library

• “Extensive” means many records, so:– > Make it easy for members to submit

• “Fine-grained” = very specific, accurate classification , so:– > Large effort for members to submit– > Many mistakes -> quality control -> high

overheads or low quality• Hence machine-assisted human curation

Page 8: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 8

HOW WOULD A GOOD RESOURCE LIBRARY LOOK?

User experience

Page 9: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Refine interface: Home Page(manually highlighted articles)

Page 10: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Refine interface: Quality = All, Theme = ENV, Time = This Week

Page 11: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

now let’s dig deeper ...

Page 12: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Widen the search Time = This Year

Page 13: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Now refine again ...

• From millions of records ...

• ... to the 5 you need

• Time: <1 min

Page 14: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 14

Select Posts & Apply Added Value Services

Page 15: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 15

Added value services

• Machine translation– unless you filtered by ‘original language’, your results list

will be in multiple languages– “Translate these 11 articles’ Titles & Intro paras into XX so

I can judge which is worth my while”• Auto-summarise: “Give me a 1 page summary of

these 5 resources”• Sentiment analysis / Opinion Mining: “What’s the

overall mood of these 16 articles?”• Etc.

Page 16: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 16

FROM HUMAN TO MACHINE-ASSISTED HUMAN CURATION

How does the content get there?

Page 17: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

BloggingPortal.eu: human curation

17

Blog

Blog

Blog

Blog

Blog

Bloggingportal.euPosts(title, intro)

BP Editor (volunteer)

Tag & Highlight(manual)

SiteUser

Followers

Subscribers

Best Ofs

Discover posts- Browse by tags- Search

Cura

te

Sour

ce

original content

Page 18: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 18

And then the humans left…

Blog

Blog

Blog

Blog

Blog

Bloggingportal.euPosts(title, intro)

BP Editor (volunteer)

Tag & Highlight(manual)

SiteUser

Followers

SubscribersBest Ofs

Discover posts- Browse by tags- Search

Cura

te

Sour

ce

Blogs = Limited Scope

No Tagging -> No Finding!

Volunteers finished Uni

Search never worked,No promotion

Page 19: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

BloggingPortal, 2009-2013

• 1116 blogs tracked – incl. inactives

• 317676 posts curated• 21 languages

“EU blogging by the numbers”, October 2013

Page 20: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 20

HashTag Europe: machine-assisted human curation

Source

Source

Source

Source

Source

A HashTag Platform Community

AllContent

Editors

SiteUser

Best Ofs

Discover content- Browse by tags- Faceted search: (combine tags)- Search- Highlight/UpVote (optional)

SemanticAnalysisEngine

AutotagSo

urce Curation

Editors

(Optional)- Highlight- Validate/correct tags

+ by Themeoriginal content

Community-trainingin multiple languages

Page 21: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 21

Hashtag Platform: crowdsourced semantic analysis training

• One platform• Hosting many

communities• All using the Engine• All training the Engine:

SubSiteCommunity

MemberCommunity

Member

Community Member

Community Member

Semantic Analysis

Machine Translation

SentimentAnalysis AutoText

Summary

Classification Service (free)

HumanCorrections (crowdsourced from users)

Train algorithm

(Consumer-Facing)Hashtag Platform

Learning Module

API

HashTag Engine

Community Member

Community Member

SubSiteCommunity

Member

Community Member

Community Member

Community Member

Community Member

Community Member SubSite

Community MemberCommunity

Member

Community Member

Community Member

Community Member

Community Member

Curated Source

Article

Curated Source

Article

Curated Source

Article

– Semantic analysis– Machine translation– Sentiment analysis– Auto-text summary

Page 22: Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com 22

Thanks for your time

Mathew LowryFresh Integral Communicationswww.freshintegral.comConnecting: [email protected] | @mathewlowry Curating: mathewlowry.tumblr.com Writing: mathew.blogactiv.eu | medium.com/@mathewlowry | LinkedIn