Cognitive computing with big data, high tech and low tech approaches
-
Upload
ted-dunning -
Category
Technology
-
view
1.295 -
download
1
description
Transcript of Cognitive computing with big data, high tech and low tech approaches
![Page 1: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/1.jpg)
© 2014 MapR Technologies 1© 2014 MapR Technologies
Cognitive Computing on Hadoop
Low Tech and High Tech Approaches
Ted Dunning
![Page 2: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/2.jpg)
© 2014 MapR Technologies 2© 2014 MapR Technologies
Cognitive Computing on Hadoop with Data
Low Tech and High Tech Approaches
Ted Dunning
![Page 3: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/3.jpg)
© 2014 MapR Technologies 3
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email [email protected] [email protected]
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
![Page 4: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/4.jpg)
© 2014 MapR Technologies 4
The outline
• The first open source, big data project
• Another big data project
• Conclusions
![Page 5: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/5.jpg)
© 2014 MapR Technologies 5
First:
An apology for going off-script
![Page 6: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/6.jpg)
© 2014 MapR Technologies 6
Now, the story
![Page 7: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/7.jpg)
© 2014 MapR Technologies 7
![Page 8: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/8.jpg)
© 2014 MapR Technologies 8
In 1866, the top finishers in the tea race reached London in 99 days, within 2 hours of each other
![Page 9: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/9.jpg)
© 2014 MapR Technologies 9
![Page 10: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/10.jpg)
© 2014 MapR Technologies 10
But in 1851, the record had been set at 89 days by the Flying Cloud
![Page 11: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/11.jpg)
© 2014 MapR Technologies 11
The difference was due (in part) to big data
![Page 12: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/12.jpg)
© 2014 MapR Technologies 12
![Page 13: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/13.jpg)
© 2014 MapR Technologies 13
![Page 14: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/14.jpg)
© 2014 MapR Technologies 14
![Page 15: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/15.jpg)
© 2014 MapR Technologies 15
These charts were free …
If you donated your data
![Page 16: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/16.jpg)
© 2014 MapR Technologies 16
But how does this apply today?
![Page 17: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/17.jpg)
© 2014 MapR Technologies 17
Key Points of Maury’s Work
• Give to get– Give the Abstract Log to captains, get data
• Data consortium wins– Merging data gives pictures nobody else can see
• Give back– Them that gives, also gets
• But this is just what every data driven web site does!– Just 150 years before everybody else
![Page 18: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/18.jpg)
© 2014 MapR Technologies 18
The Real News in Behavioral Analysis
• Everybody knows that:
• You need ensembles of many models to do recommendations
• You need to use factorization models
• You predict what you observe
• (You should predict ratings)
![Page 19: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/19.jpg)
© 2014 MapR Technologies 19
But …
none of this is really true
![Page 20: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/20.jpg)
© 2014 MapR Technologies 20
In fact,
• Fancy models are rarely useful expenditures of time• Factorization can be good, but not much better (if at all)• Ratings are disastrously bad data• Cross-recommendation and multi-modal recommendations are
much more interesting– Multiple kinds of input are far better than multiple models
• The UI has a far larger impact than the models• The best algorithms combine simplicity with accuracy
– So simple you can embed them in a search engine
![Page 21: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/21.jpg)
© 2014 MapR Technologies 21
Here’s how
![Page 22: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/22.jpg)
© 2014 MapR Technologies 22
Cooccurrence Analysis
![Page 23: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/23.jpg)
© 2014 MapR Technologies 23
How Often Do Items Co-occur
![Page 24: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/24.jpg)
© 2014 MapR Technologies 24
Which Co-occurrences are Interesting?
Each row of indicators becomes a field in a search engine document
![Page 25: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/25.jpg)
© 2014 MapR Technologies 25
Recommendations
Alice got an apple and a puppyAlice
![Page 26: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/26.jpg)
© 2014 MapR Technologies 26
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
![Page 27: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/27.jpg)
© 2014 MapR Technologies 27
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob Bob got an apple
![Page 28: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/28.jpg)
© 2014 MapR Technologies 28
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob What else would Bob like?
![Page 29: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/29.jpg)
© 2014 MapR Technologies 29
Recommendations
Alice got an apple and a puppyAlice
Charles got a bicycleCharles
Bob A puppy!
![Page 30: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/30.jpg)
© 2014 MapR Technologies 30
You get the idea of how recommenders can work…
![Page 31: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/31.jpg)
© 2014 MapR Technologies 31
By the way, like me, Bob also wants a pony…
![Page 32: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/32.jpg)
© 2014 MapR Technologies 32
Recommendations
?
Alice
Bob
Charles
Amelia
What if everybody gets a pony?
What else would you recommend for new user Amelia?
![Page 33: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/33.jpg)
© 2014 MapR Technologies 33
Recommendations
?
Alice
Bob
Charles
Amelia
If everybody gets a pony, it’s not a very good indicator of what to else predict...
![Page 34: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/34.jpg)
© 2014 MapR Technologies 34
Problems with Raw Co-occurrence
• Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music
• Very widespread occurrence is not interesting to generate indicators for recommendation– Unless you want to offer an item that is constantly desired, such as
razor blades (or ponies)• What we want is anomalous co-occurrence
– This is the source of interesting indicators of preference on which to base recommendation
![Page 35: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/35.jpg)
© 2014 MapR Technologies 35
Overview: Get Useful Indicators from Behaviors
1. Use log files to build history matrix of users x items– Remember: this history of interactions will be sparse compared to all
potential combinations
2. Transform to a co-occurrence matrix of items x items
3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix– Log Likelihood Ratio (LLR) can be helpful to judge which co-
occurrences can with confidence be used as indicators of preference– ItemSimilarityJob in Apache Mahout uses LLR
![Page 36: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/36.jpg)
© 2014 MapR Technologies 36
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 2
![Page 37: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/37.jpg)
© 2014 MapR Technologies 37
Which one is the anomalous co-occurrence?
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
A not A
B 1 0
not B 0 20.90 1.95
4.52 14.3
![Page 38: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/38.jpg)
© 2014 MapR Technologies 38
Collection of Documents: Insert Meta-Data
Search Technology
Item meta-data
Document for “puppy” id: t4
title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
Ingest easily via NFS
![Page 39: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/39.jpg)
© 2014 MapR Technologies 39
From Indicator Matrix to New Indicator Field
✔
id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet
indicators: (t1)
Solr document for “puppy”
Note: data for the indicator field is added directly to meta-data for a document in Apache Solr or Elastic Search index. You don’t need to create a separate index for the indicators.
![Page 40: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/40.jpg)
© 2014 MapR Technologies 40
Going Further: Multi-Modal Recommendation
![Page 41: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/41.jpg)
© 2014 MapR Technologies 41
Going Further: Multi-Modal Recommendation
![Page 42: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/42.jpg)
© 2014 MapR Technologies 42
For example
• Users enter queries (A)– (actor = user, item=query)
• Users view videos (B)– (actor = user, item=video)
• ATA gives query recommendation– “did you mean to ask for”
• BTB gives video recommendation– “you might like these videos”
![Page 43: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/43.jpg)
© 2014 MapR Technologies 43
The punch-line
• BTA recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)
![Page 44: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/44.jpg)
© 2014 MapR Technologies 44
Real-life example
• Query: “Paco de Lucia”• Conventional meta-data search results:
– “hombres de paco” times 400– not much else
• Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff
![Page 45: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/45.jpg)
© 2014 MapR Technologies 45
Real-life example
![Page 46: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/46.jpg)
© 2014 MapR Technologies 46
Hypothetical Example
• Want a navigational ontology?• Just put labels on a web page with traffic
– This gives A = users x label clicks
• Remember viewing history– This gives B = users x items
• Cross recommend– B’A = label to item mapping
• After several users click, results are whatever users think they should be
![Page 47: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/47.jpg)
© 2014 MapR Technologies 47
More Details Available
available for free at
http://www.mapr.com/practical-machine-learning
![Page 48: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/48.jpg)
© 2014 MapR Technologies 48
Who I am
Ted Dunning, Chief Applications Architect, MapR Technologies
Email [email protected] [email protected]
Twitter @Ted_Dunning
Apache Mahout https://mahout.apache.org/
Twitter @ApacheMahout
![Page 49: Cognitive computing with big data, high tech and low tech approaches](https://reader034.fdocuments.in/reader034/viewer/2022051818/54b6cbe74a7959772b8b45d3/html5/thumbnails/49.jpg)
© 2014 MapR Technologies 49
Q & A
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies