Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016
-
Upload
mathew-lowry -
Category
Internet
-
view
685 -
download
0
Transcript of Hashtag Engine: A Technology Challenge for Language Technology Innovate 2016
ʄreshintegral.com
Challenges for the LT Industry
Hashtag Platform: machine-assisted multilingual curation
Mathew Lowry, Fresh Integral Communications
ʄreshintegral.com 2
Outline
• Who, Why & What: – Online Communities– multilingual content curation
• User experience: how would it look• How: Hashtag Engine & Platform
– Engine: machine-assisted multilingual curation– The Platform: crowdsourcing Engine training
ʄreshintegral.com 3
THE USERSHashtag Platform
ʄreshintegral.com 4
Communities of Interest or Practice
• Informal network: people with common interest– Online and/or Offline
• Apply collective knowledge -> each Member’s problems– Share problems, solutions, ideas & stories– Learn, teach, network, influence, visibility, establish
mindshare & expertise …• “A community that learns”, collectively raising
each others’ knowledge
ʄreshintegral.com 5
Can’t find what – or who – you’re looking for?
ʄreshintegral.com 6
Useful Knowledge: A Communities’ Killer App
• What: Library of Useful Resources– Extensive (lots of content) – Fine-grained (categorised with detailed taxonomy)– Multiple languages: multilingual taxonomies, translated abstracts,
machine & community translations• How: Created by Community• Why
– Improve community’s Collective Intelligence– Improve Content Discovery ... & People Discovery
• Problem: Vicious circle / Chicken-Egg– nobody’s submitting anything because noone’s here– noone’s here because nobody’s submitting anything
ʄreshintegral.com 7
“Extensive AND Fine-Grained” Resource Library
• “Extensive” means many records, so:– > Make it easy for members to submit
• “Fine-grained” = very specific, accurate classification , so:– > Large effort for members to submit– > Many mistakes -> quality control -> high
overheads or low quality• Hence machine-assisted human curation
ʄreshintegral.com 8
HOW WOULD A GOOD RESOURCE LIBRARY LOOK?
User experience
ʄreshintegral.com
Refine interface: Home Page(manually highlighted articles)
ʄreshintegral.com
Refine interface: Quality = All, Theme = ENV, Time = This Week
ʄreshintegral.com
now let’s dig deeper ...
ʄreshintegral.com
Widen the search Time = This Year
ʄreshintegral.com
Now refine again ...
• From millions of records ...
• ... to the 5 you need
• Time: <1 min
ʄreshintegral.com 14
Select Posts & Apply Added Value Services
ʄreshintegral.com 15
Added value services
• Machine translation– unless you filtered by ‘original language’, your results list
will be in multiple languages– “Translate these 11 articles’ Titles & Intro paras into XX so
I can judge which is worth my while”• Auto-summarise: “Give me a 1 page summary of
these 5 resources”• Sentiment analysis / Opinion Mining: “What’s the
overall mood of these 16 articles?”• Etc.
ʄreshintegral.com 16
FROM HUMAN TO MACHINE-ASSISTED HUMAN CURATION
How does the content get there?
ʄreshintegral.com
BloggingPortal.eu: human curation
17
Blog
Blog
Blog
Blog
Blog
Bloggingportal.euPosts(title, intro)
BP Editor (volunteer)
Tag & Highlight(manual)
SiteUser
Followers
Subscribers
Best Ofs
Discover posts- Browse by tags- Search
Cura
te
Sour
ce
original content
ʄreshintegral.com 18
And then the humans left…
Blog
Blog
Blog
Blog
Blog
Bloggingportal.euPosts(title, intro)
BP Editor (volunteer)
Tag & Highlight(manual)
SiteUser
Followers
SubscribersBest Ofs
Discover posts- Browse by tags- Search
Cura
te
Sour
ce
Blogs = Limited Scope
No Tagging -> No Finding!
Volunteers finished Uni
Search never worked,No promotion
ʄreshintegral.com
BloggingPortal, 2009-2013
• 1116 blogs tracked – incl. inactives
• 317676 posts curated• 21 languages
“EU blogging by the numbers”, October 2013
ʄreshintegral.com 20
HashTag Europe: machine-assisted human curation
Source
Source
Source
Source
Source
A HashTag Platform Community
AllContent
Editors
SiteUser
Best Ofs
Discover content- Browse by tags- Faceted search: (combine tags)- Search- Highlight/UpVote (optional)
SemanticAnalysisEngine
AutotagSo
urce Curation
Editors
(Optional)- Highlight- Validate/correct tags
+ by Themeoriginal content
Community-trainingin multiple languages
ʄreshintegral.com 21
Hashtag Platform: crowdsourced semantic analysis training
• One platform• Hosting many
communities• All using the Engine• All training the Engine:
SubSiteCommunity
MemberCommunity
Member
Community Member
Community Member
Semantic Analysis
Machine Translation
SentimentAnalysis AutoText
Summary
Classification Service (free)
HumanCorrections (crowdsourced from users)
Train algorithm
(Consumer-Facing)Hashtag Platform
Learning Module
API
HashTag Engine
Community Member
Community Member
SubSiteCommunity
Member
Community Member
Community Member
Community Member
Community Member
Community Member SubSite
Community MemberCommunity
Member
Community Member
Community Member
Community Member
Community Member
Curated Source
Article
Curated Source
Article
Curated Source
Article
– Semantic analysis– Machine translation– Sentiment analysis– Auto-text summary
ʄreshintegral.com 22
Thanks for your time
Mathew LowryFresh Integral Communicationswww.freshintegral.comConnecting: [email protected] | @mathewlowry Curating: mathewlowry.tumblr.com Writing: mathew.blogactiv.eu | medium.com/@mathewlowry | LinkedIn