Human Language Technology in a Big Data World · Human Language Technology in a Big Data World...

Post on 15-Aug-2020

0 views 0 download

Transcript of Human Language Technology in a Big Data World · Human Language Technology in a Big Data World...

Human Language Technology in a Big Data World

@chris_biow #HLTCon (2016)

Big Data Universe

Sooo Big!

Tooo Big!

Taming Big

Taming Too Big

Exponential Hyperbole!!

Yer gonna die. Standard mountaineering warning

●  Data is exploding without limit ●  I can draw a curve on a semi-log-scale

graph ●  Even if that almost never happens in reality

●  Buy my vision or drown in data

Wgsimon / Wikimedia Commons / Creative Commons Attribution-Share Alike 3.0 Unported

Exponential Reality

Qef / Wikimedia Commons / Public Domain

Human Language World

Exponential Sobriety

Most growth is exponential. Chris Lindblad

MarkLogic Founder

Measure 10^ 2^ Example

Kilobyte 3 10 12 lines of 80 characters

Megabyte 6 20 500 pages, 48 hours typing

Gigabyte 9 30 30 minutes Twitter text feed

Terabyte 12 40 2 weeks Twitter text feed

Petabyte 15 50 Humanity typing for 8 hours

Exabyte 18 60 Humanity typing for 1 year

Zettabyte 21 70 Global IP traffic 2016 [Cisco 2013]

Yottabyte 24 80 (break glass in case of need)

Distinguishing Big Data Follow the money.

Volume Bounded

Variety Text and voice

Velocity Latency

Value Fixed % of all

Veracity Not necc. required

Big Data Tech

I shall not today attempt further to define [it], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it…

Justice Potter Stewart, 1964 (emphasis added)

Defining Big Data

Data whose volume, velocity, and variety determines your choice of software and infrastructure.

Achieving Big Data

Year Company Customer Project Quantity (M)

Size (GB)

Project Cost ($M)

2003 Verity TRW, DIA WISE 40 200 10

2006 Veronomy Bloomberg News 200 1,000 30

2009 MarkLogic Gov & Comm. OSINT 2,000 200,000 100

2014 MongoDB AWS ReInvent goo.gl/xZVgdl

7,000 1,000,000 0.003

Features & Functions

Text-Ready Tech

State of the Mission in Text Analytics

Entity Extraction

Text Translation

Relationship Extraction

Name Translation

Search

Database

Language ID

Sentiment Analysis Rare, new

Languages

Name Translation

Alerting

Voice of the X

Partial Parse

Gap Solved

What language? bú

ana raye7 el gam3a el sa3a 3 el 3asr. el gaw 3amel eh elnaharda f eskendereya?

Lessons Learned •  Requirements are wrong

•  Every power of 4 will invalidate some requirements and solutions

•  Agile processes fit Big HLT

•  Measure to costs and to mission at each increment

•  Express requirements exponentially

•  Expect competence and confidence with Big Data

•  Progress exponentially (powers of 4)

•  Adjust requirements as you learn how they meet the mission