Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

Post on 13-Jan-2017

685 views 0 download

Transcript of Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016

ʄreshintegral.com

Challenges for the LT Industry

Hashtag Platform: machine-assisted multilingual curation

Mathew Lowry, Fresh Integral Communications

ʄreshintegral.com 2

Outline

• Who, Why & What: – Online Communities– multilingual content curation

• User experience: how would it look• How: Hashtag Engine & Platform

– Engine: machine-assisted multilingual curation– The Platform: crowdsourcing Engine training

ʄreshintegral.com 3

THE USERSHashtag Platform

ʄreshintegral.com 4

Communities of Interest or Practice

• Informal network: people with common interest– Online and/or Offline

• Apply collective knowledge -> each Member’s problems– Share problems, solutions, ideas & stories– Learn, teach, network, influence, visibility, establish

mindshare & expertise …• “A community that learns”, collectively raising

each others’ knowledge

ʄreshintegral.com 5

Can’t find what – or who – you’re looking for?

ʄreshintegral.com 6

Useful Knowledge: A Communities’ Killer App

• What: Library of Useful Resources– Extensive (lots of content) – Fine-grained (categorised with detailed taxonomy)– Multiple languages: multilingual taxonomies, translated abstracts,

machine & community translations• How: Created by Community• Why

– Improve community’s Collective Intelligence– Improve Content Discovery ... & People Discovery

• Problem: Vicious circle / Chicken-Egg– nobody’s submitting anything because noone’s here– noone’s here because nobody’s submitting anything

ʄreshintegral.com 7

“Extensive AND Fine-Grained” Resource Library

• “Extensive” means many records, so:– > Make it easy for members to submit

• “Fine-grained” = very specific, accurate classification , so:– > Large effort for members to submit– > Many mistakes -> quality control -> high

overheads or low quality• Hence machine-assisted human curation

ʄreshintegral.com 8

HOW WOULD A GOOD RESOURCE LIBRARY LOOK?

User experience

ʄreshintegral.com

Refine interface: Home Page(manually highlighted articles)

ʄreshintegral.com

Refine interface: Quality = All, Theme = ENV, Time = This Week

ʄreshintegral.com

now let’s dig deeper ...

ʄreshintegral.com

Widen the search Time = This Year

ʄreshintegral.com

Now refine again ...

• From millions of records ...

• ... to the 5 you need

• Time: <1 min

ʄreshintegral.com 14

Select Posts & Apply Added Value Services

ʄreshintegral.com 15

Added value services

• Machine translation– unless you filtered by ‘original language’, your results list

will be in multiple languages– “Translate these 11 articles’ Titles & Intro paras into XX so

I can judge which is worth my while”• Auto-summarise: “Give me a 1 page summary of

these 5 resources”• Sentiment analysis / Opinion Mining: “What’s the

overall mood of these 16 articles?”• Etc.

ʄreshintegral.com 16

FROM HUMAN TO MACHINE-ASSISTED HUMAN CURATION

How does the content get there?

ʄreshintegral.com

BloggingPortal.eu: human curation

17

Blog

Blog

Blog

Blog

Blog

Bloggingportal.euPosts(title, intro)

BP Editor (volunteer)

Tag & Highlight(manual)

SiteUser

Followers

Subscribers

Best Ofs

Discover posts- Browse by tags- Search

Cura

te

Sour

ce

original content

ʄreshintegral.com 18

And then the humans left…

Blog

Blog

Blog

Blog

Blog

Bloggingportal.euPosts(title, intro)

BP Editor (volunteer)

Tag & Highlight(manual)

SiteUser

Followers

SubscribersBest Ofs

Discover posts- Browse by tags- Search

Cura

te

Sour

ce

Blogs = Limited Scope

No Tagging -> No Finding!

Volunteers finished Uni

Search never worked,No promotion

ʄreshintegral.com

BloggingPortal, 2009-2013

• 1116 blogs tracked – incl. inactives

• 317676 posts curated• 21 languages

“EU blogging by the numbers”, October 2013

ʄreshintegral.com 20

HashTag Europe: machine-assisted human curation

Source

Source

Source

Source

Source

A HashTag Platform Community

AllContent

Editors

SiteUser

Best Ofs

Discover content- Browse by tags- Faceted search: (combine tags)- Search- Highlight/UpVote (optional)

SemanticAnalysisEngine

AutotagSo

urce Curation

Editors

(Optional)- Highlight- Validate/correct tags

+ by Themeoriginal content

Community-trainingin multiple languages

ʄreshintegral.com 21

Hashtag Platform: crowdsourced semantic analysis training

• One platform• Hosting many

communities• All using the Engine• All training the Engine:

SubSiteCommunity

MemberCommunity

Member

Community Member

Community Member

Semantic Analysis

Machine Translation

SentimentAnalysis AutoText

Summary

Classification Service (free)

HumanCorrections (crowdsourced from users)

Train algorithm

(Consumer-Facing)Hashtag Platform

Learning Module

API

HashTag Engine

Community Member

Community Member

SubSiteCommunity

Member

Community Member

Community Member

Community Member

Community Member

Community Member SubSite

Community MemberCommunity

Member

Community Member

Community Member

Community Member

Community Member

Curated Source

Article

Curated Source

Article

Curated Source

Article

– Semantic analysis– Machine translation– Sentiment analysis– Auto-text summary

ʄreshintegral.com 22

Thanks for your time

Mathew LowryFresh Integral Communicationswww.freshintegral.comConnecting: mathew.lowry@gmail.com | @mathewlowry Curating: mathewlowry.tumblr.com Writing: mathew.blogactiv.eu | medium.com/@mathewlowry | LinkedIn