A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
-
Upload
basis-technology -
Category
Technology
-
view
210 -
download
0
description
Transcript of A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
![Page 1: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/1.jpg)
Basis Technology – Human Language Technology Conference 2012 1
Clouds, Search or HLT The 'forecast'?
Benson Margulies Executive Vice President and Chief Technology Officer
![Page 2: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/2.jpg)
Basis Technology – Human Language Technology Conference 2012 2
Clouds, Search or HLT The 'forecast'?
![Page 3: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/3.jpg)
Basis Technology – Human Language Technology Conference 2012 3
Meteorology - or - Why Clouds
• Lie on the grass and look up at the clouds • Everyone sees something different
• Computerized Clouds are no different • Applica;ons Always Available • Data Always Available • Tools for Processing Big Data
![Page 4: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/4.jpg)
Basis Technology – Human Language Technology Conference 2012 4
Big Data and Clouds =~ Hadoop
• It's not just a maFer of size • Hadoop ...
o Takes in structured data sets o Op;mizes stateless, batch processes o Moves computa3on to data
• All of which is great if that's what you have • The world is more complicated than that
![Page 5: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/5.jpg)
Basis Technology – Human Language Technology Conference 2012 5
What it Doesn't Do So Easily
• On-‐the-‐fly (non-‐batch) processing • Stateful, non-‐local, processing • For example, consider a search engine
o All about online: a document arrives, users want to find it.
o All about global state: relevancy involves global data across the whole index.
![Page 6: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/6.jpg)
Basis Technology – Human Language Technology Conference 2012 6
More on Search-in-a-Cloud
• Good News: 'conven;onal' technologies scale to very large indices. o Solr o SolrCloud o Elas;c Search o ...
• How? Shards. o 'hash' to split docs o queries go everywhere
![Page 7: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/7.jpg)
Basis Technology – Human Language Technology Conference 2012 7
Search-in-a-Cloud less good news
• Alterna;ves are s;ll: o Limited o Research o or both
• Solandra o Scaling via Cassandra o 'just another sharded solu;on' o Just the thing if you like Cassandra
• or Accumulo o So far, very basic inverted index o beFer things coming
![Page 8: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/8.jpg)
Basis Technology – Human Language Technology Conference 2012 8
Other HLT tasks ...
• 'Extrac;on' is 'straighZorward' • Text comes in, en;;es or rela;onships come
out. • Results end up in graph DB or bigtable or ... • Scale via Hadoop or whatever • The Challenge of Mixing and Matching • But ... what if you want a feedback loop?
![Page 9: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/9.jpg)
Basis Technology – Human Language Technology Conference 2012 9
Interoperation
• Lot's of focus on applica;ons o e.g. Ozone Widgets
• Not so much on backend processes • What good is 'data everywhere' if:
o you can't deploy processing to exploit it? o you can't fit together pieces of the puzzle?
• A stovepipe in a cloud is s;ll........ • A stovepipe
![Page 10: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/10.jpg)
Basis Technology – Human Language Technology Conference 2012 10
Harder Unstructured Problems
• Imagine you wanted to cluster ... • New items show up • Need to find 'best' exis;ng cluster
o It could be 'anywhere'
• Need to update to reflect each new item • (If you're wondering what we're clustering ...)
![Page 11: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/11.jpg)
Basis Technology – Human Language Technology Conference 2012 11
Rosette Concrete Examples
• Straight Search o RoseFe Solr Plugins work all the same o SolrCloud hashes/shards o RoseFe runs on the target node
• Extrac;on and similar processes o Same story, using Update Request Processor
![Page 12: A Lightning Introduction To Clouds & HLT - Human Language Technology Conference](https://reader033.fdocuments.in/reader033/viewer/2022060117/55865586d8b42a5c128b46f3/html5/thumbnails/12.jpg)
Basis Technology – Human Language Technology Conference 2012 12
Rosette and Hadoop
• Stateless APIs lead to simple implementa;on • Non-‐code resources lead to some issues • Stateful processes (e.g. RNI) ... back to Solr