Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 1
DATA MINING AND MACHINE LEARNINGIN A NUTSHELL
COLLECTIVE INTELLIGENCEPART I
Mohammad-Ali Abbasihttp://www.public.asu.edu/~mabbasi2/
SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERINGARIZONA STATE UNIVERSITY
http://dmml.asu.edu/
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 2
About Collective Intelligence
• Definition of collective intelligence– Examples happening around us
• What constitutes collective intelligence– Groups, number of members, variety, etc.
• How can one improve collective intelligence– What are necessary conditions to achieve CI– A case in data mining and machine learning?
• What can one do with collective intelligence in the age of social media– Opportunities for Data Mining
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 3
Definitions for Collective intelligence
• Wikipedia– Collective intelligence is a shared or group intelligence that
emerges from the collaboration and competition of many individuals
• MIT Center for CI– Groups of individuals doing things collectively that seem intelligent
• Toby Segaran in Programming CI– Combining the behavior, preferences, or ideas of a group of people
to create novel insights
• Unknown– Collective intelligence is any intelligence that arises from - or is a
capacity or characteristic of - groups and other collective living systems
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 4
Examples of collective intelligence - Wikipedia
• Wikipedia
• Thousands of contributors from across the world have collectively created the world’s largest encyclopedia
• with almost no centralized control
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 5
Examples of collective intelligence - PageRank
• PageRank Algorithm used by Google
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 6
Examples of collective intelligence - CAPTCHA
• CAPTCHA– Completely Automated Public Turing test to tell Computers and Humans Apart– A reverse Turing test (machine to human instead of human to machine)
• A service that helps to digitize books, newspapers and old time radio shows– About 200 million CAPTCHAs are solved by
humans around the world every day– More than 150,000 hours of work each day
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 7
Vark.com
1. Send a question
2. Aardvark finds the perfect person to answer
3. Get their response
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 8
Kasparov vs. the World
• Kasparov v. the World was a chess match held in 1999, when world champion Gary Kasparov played against “the World,” with the World’s moves determined by majority vote over the Internet of anyone who wanted to participate.
Kasparov eventually won, but he said it was the hardest game he ever played
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 9
Examples of collective intelligence - Threadless
• Threadless.com
• In Threadless, anyone who wants to can design a T-shirt, submit that design to a weekly contest, and vote for their favorite designs
• the company harnesses the collective intelligence of a community of over 500,000 people to design and select T-shirts
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 10
Examples of collective intelligence –Google Image Labeler
• It is a feature, in the form of a game, of Google Image Search that allows the user to label random images to help improve the quality of Google's image search results
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 11
Examples of collective intelligence – Ant Societies
• Ant societies exhibit more intelligence than any other animal except for humans, if we measure intelligence in terms of technology. Ant societies are able to do agriculture, in fact, in several different forms of agriculture. Some ant societies keep livestock of various forms, for example, some ants keep and care for aphids for "milking”; Leaf cutters care for fungi and carry leaves to feed the fungi.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 12
Examples of collective intelligence - Games
• Games such as WorldCraft, The Sims, Halo or Second Life are designed to be more non-linear and depend on collective intelligence for expansion.
• This way of sharing is gradually evolving and influencing the mindset of the current and future generations.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 15
Principals of Collective Intelligence
• Collective intelligence is of mass collaboration. In order for collective intelligence to emerge, four principles exist to promote creativity: – Openness– Peering– Sharing and – Acting globally
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 16
Openness
• Traditionally, people and companies are naturally reluctant to share ideas and intellectual property because these resources provide the edge over competitors.
• However, in time, openness is promoted when people and companies began to loosen hold over these resources as they reap more benefits in doing so.
• Openness enables products to gain significant improvement and scrutiny through transparent collaboration.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 17
Peering
• A form of horizontal organization with the capacity to create information technology and physical products.
• One example is the ‘opening up’ of the Linux program where users are free to modify and develop it provided that they made it available for others.
• Participants in this form of collective intelligence may have different motivations for contributing, but the results achieved are for the improvement of a product or service.
• “Peering succeeds because it leverages self-organization – a style of production that works more effectively than hierarchical management for certain tasks.”
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 18
Sharing
• Research has shown that more and more companies have started to share some, while maintaining some degree of control over others, like potential and critical patent rights.
• This is because companies have realized that by limiting all their intellectual property, they are shutting out all possible opportunities.
• Sharing some has allowed them to expand their market and bring out products faster.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 19
Acting Globally
• The advancement in communication technology has prompted the rise of global companies, or e-Commerce that has allowed individuals to set up businesses at low to almost no overhead costs.
• The influence of the Internet is widespread, therefore a globally integrated company would have no geographical boundaries but have global connections, allowing them to gain access to new markets, ideas and technology.
• Therefore it is important for firms to get updated and remain globally competitive or they will face a declining rate of clients.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 20
Types of Collective Intelligence
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 21
Elements of Collective Intelligence
• Staffing– Who is performing the task?
• Incentives– Why are they doing it?
• Goal– What is being accomplished?
• Structure, process– How is it being done?
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 22
Elements of Collective Intelligence
• Who?– Hierarchy– Crowd
• Why?– Money– Love– Glory
• What?– Create– Decide
• Who– Collection– Collaboration
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 23
Mapping the collective intelligence elements for Wikipedia
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 24
Issues with Crowd Wisdom
• Questions– Why can the crowd be smarter than any individual
in the crowd?– Is it guaranteed? If not, what are the conditions
under which the crowd can make best decisions?– How can one gauge the reliability of crowd
wisdom? Is crowd wisdom valid, trustworthy, and verifiable?
– How to find a crowd, its leader/influencer/average opinion?
– How is each member influenced by others?
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 25
Collective Intelligence and Societies
• The main base of all kinds of CI’s is society
• CI in traditional societies– Families, companies, countries, and armies are all
groups of individuals doing things collectively that, at least sometimes, seem intelligent
• CI in Web based societies- Social Networking sites– Internet and specially Web 2.0 applications
provide a platform for communications and building societies
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 26
• Web 2.0• Social Computing
Collective Intelligence and the Internet
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 27
Web Impacts on CI
• The ability of new media to easily store and retrieve information, predominantly through databases and the Internet, allows it to be shared without difficulty.
• Thus, through interaction with new media, knowledge easily passes between sources resulting in another form of collective intelligence
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 28
WEB 2.0 and Many Variants
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 29
Elements of WEB 2.0
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 30
Web 2.0: Evolution Towards a Read/Write Platform
Web 1.0(1993-2003)
Pretty much HTML pages viewed through a browser
Web 2.0(2003- beyond)
Web pages, plus a lot of other “content” shared over the web, with more interactivity; more like an
application than a “page”
“Read” Mode “Write” & Contribute
“Page” Primary Unit of content “Post / record”
“static” State “dynamic”
Web browser Viewed through… Browsers, RSS Readers, anything
“Client Server” Architecture “Web Services”
Web Coders Content Created by… Everyone
“geeks” Domain of… “mass amatuerization”
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 31
CI in Social Media
• Crowd members assign different weights to individual inputs on the basis of their relationship with the people who provided them and then make individual decisions – Blogosphere– Facebook– YouTube– Epinions.com– Amazon– eBay– Digg
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 32
Blogging is the Most Recognized Example of Web 2.0
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 33
Blogging is the Most Recognized Example of Web 2.0
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 34
Wikipedia is a Collaborative Dictionary Being Edited in Real-time by Anyone
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 35
Alive At ASU
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 36
WEB 2.0 Technologies
• APIs
• RSS (Really Simple Syndication)– Content Syndication
• Web Services– Open Data
• AJAX (Asynchronous Javascript and XML)
• CSS (Cascading Style Sheets)– Content with Style
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 37
WEB 2.0, Summing Up• Web 2.0 hard to define, but very far from just hype
– Culmination of a number of Web trends• Importance of Open Data
– Allows communities to assemble unique tailored applications
• Importance of Users
– Seek and create network effects• Browser as Application Platform
– Huge potential for new kinds of Web applications
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 38
Programming Collective Intelligence
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 39
Crawl the web
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 40
Spiders (Robots/Bots/Crawlers)
• Start with a comprehensive set of root URL’s from which to start the search.
• Follow all links on these pages recursively to find additional pages.
• Index all novel found pages in an inverted index as they are encountered.
• May allow users to directly submit pages to be indexed (and crawled from).
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 41
Search Strategies
Breadth-first Search
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 42
Search Strategies (cont)
Depth-first Search
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 43
Search Strategy Trade-Off’s
• Breadth-first explores uniformly outward from the root page but requires memory of all nodes on the previous level (exponential in depth). Standard spidering method.
• Depth-first requires memory of only depth times branching-factor (linear in depth) but gets “lost” pursuing a single thread.
• Both strategies implementable using a queue of links (URL’s).
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 44
Spidering Algorithm
• Initialize queue (Q) with initial set of known URL’s.
• Until Q empty or page or time limit exhausted:– Pop URL, L, from front of Q.– If L is not to an HTML page (.gif, .jpeg, .ps, .pdf, .ppt…)
• continue loop.
– If already visited L, continue loop.– Download page, P, for L.– If cannot download P (e.g. 404 error, robot excluded)
• continue loop.
– Index P (e.g. add to inverted index or store cached copy).– Parse P to obtain list of new links N.– Append N to the end of Q.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 45
Keeping Spidered Pages Up to Date
• Web is very dynamic: many new pages, updated pages, deleted pages, etc.
• Periodically check spidered pages for updates and deletions:– Just look at header info (e.g. META tags on last update) to
determine if page has changed, only reload entire page if needed.
• Track how often each page is updated and preferentially return to pages which are historically more dynamic.
• Preferentially update pages that are accessed more often to optimize freshness of more popular pages.
Data Mining and Machine Learning- in a nutshellArizona State University Data Mining and Machine Learning Lab Collective Intelligence 46
Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis.
http://www.public.asu.edu/~mabbasi2/