C ROWD S EARCHING (And Beyond)

68
CROWDSEARCHING (AND BEYOND) Stefano Ceri Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria Crowdsearcher 1

description

C ROWD S EARCHING (And Beyond). Stefano Ceri Politecnico di Milano Dipartimento di Elettronica , Informazione e BioIngegneria. Crowd-based Applications. Emerging crowd-based applications : opinion mining localized information gathering marketing campaigns - PowerPoint PPT Presentation

Transcript of C ROWD S EARCHING (And Beyond)

Page 1: C ROWD S EARCHING (And Beyond)

1

CROWDSEARCHING(AND BEYOND)

Stefano CeriPolitecnico di Milano

Dipartimento di Elettronica, Informazione e BioIngegneria

Crowdsearcher

Page 2: C ROWD S EARCHING (And Beyond)

2

Crowd-based Applications• Emerging crowd-based applications:

• opinion mining • localized information gathering • marketing campaigns • expert response gathering

• General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers

(typically unknown to the requestor)• the system organizes a response collection campaign

• Include crowdsourcing and crowdsearching

Crowdsearcher

Page 3: C ROWD S EARCHING (And Beyond)

3

The “system” is a wide concept • Crowd-based applications may use social networks and Q&A

websites in addition to crowdsourcing platforms• Our approach: a coordination engine which keeps an overall

control on the application deployment and execution

Crowdsearcher

CrowdSearcher

AP

I Access

Page 4: C ROWD S EARCHING (And Beyond)

4

A simple example of crowdsearching

Crowdsearcher

Page 5: C ROWD S EARCHING (And Beyond)

5

Example: Find your job (social invitation)

Crowdsearcher

Page 6: C ROWD S EARCHING (And Beyond)

6

Example: Find your job (social invitation)

Selected data items can be transferred to the crowd question

Crowdsearcher

Page 7: C ROWD S EARCHING (And Beyond)

7

Find your job (response submission)

Crowdsearcher

Page 8: C ROWD S EARCHING (And Beyond)

8

Crowdsearcher results (in the loop)Crowdsearcher

Page 9: C ROWD S EARCHING (And Beyond)

9

Deployment alternatives • Multi-platform deployment

Embedded application

Social/ Crowd platformNative

behaviours

External application

Standalone application

API

Embedding

Community / Crowd

Generated query template

Native

Crowdsearcher

Page 10: C ROWD S EARCHING (And Beyond)

10

Deployment: search on a social network• Multi-platform deployment

Crowdsearcher

Page 11: C ROWD S EARCHING (And Beyond)

11

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

Page 12: C ROWD S EARCHING (And Beyond)

12

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

Page 13: C ROWD S EARCHING (And Beyond)

13

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

Page 14: C ROWD S EARCHING (And Beyond)

14

THE MODEL ANDTHE PROCESS

Crowdsearcher

Page 15: C ROWD S EARCHING (And Beyond)

15

CrowdSearcher• Combines a conceptual framework, a specification

paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring

applications on top of crowd-based systems• Design is top-down, platform-independent• Deployment turns declarative specifications into platform-specific

implementations which include social networks and crowdsourcing platforms

• Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability

• Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013)

Crowdsearcher

Page 16: C ROWD S EARCHING (And Beyond)

16

• A simple task design and deployment process, based on specific data structures• created using model-driven transformations• driven by the task specification

The Design Process

Task Specification Task Planning Control

Specification

Crowdsearcher

• Task Specification: task operations, objects, and performers• Task Planning: work distribution• Control Specification: task control policies

Page 17: C ROWD S EARCHING (And Beyond)

17

DEMO !

Crowdsearcher

Page 18: C ROWD S EARCHING (And Beyond)

18

Valuable ideas: 1. Operation types• In a Task, performers are required to execute logical operations on input objects

• e.g. Locate the faces of the people appearing in the following 5 images

• CrowdSearcher offers pre-defined operation types:• Like: Ask a performer to express a preference (true/false)

• e.g. Do you like this picture?• Comment: Ask a performer to write a description / summary / evaluation

• e.g. Can you summarize the following text using your own words?• Tag: Ask a performer to annotate an object with a set of tags

• e.g. How would you label the following image?• Classify: Ask a performer to classify an object within a closed-set of alternatives

• e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema

• e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?• Modify: Ask a performer to verify/modify the content of one or more input object

• e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects

• e.g. Order the following books according to your taste

Crowdsearcher

Page 19: C ROWD S EARCHING (And Beyond)

19

2. Platform-independent Meta-Model

Crowdsearcher

Page 20: C ROWD S EARCHING (And Beyond)

20

3. Reactive Crowdsourcing• A conceptual framework for controlling the execution of

crowd-based computations. Based on: • Control Marts• Active Rules

• Classical forms of controls:• Majority control (to close object computations)• Quality control (to check that quality constraints are met)• Spam detection (to detect / eliminate some performers)• Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers)

Crowdsearcher

Page 21: C ROWD S EARCHING (And Beyond)

21

Why Active Rules?• Ease of Use: control is easily expressible

• Simple formalism, simple computation• Power: arbitrarily complex controls is supported

• Extensibility mechanisms• Automation: active rules can be system-generated

• Well-defined semantics• Flexibility: localized impact of changes on the rules set

• Control isolation• Known formal properties descending from known theory

• Termination, confluence

Crowdsearcher

Page 22: C ROWD S EARCHING (And Beyond)

22

4. Control Mart• Data structure for controlling application execution, inspired by data

marts (for data warehousing); content is automatically built from task specification & planning

• Central entity: MicroTask Object Execution

• Dimensions: Task / Operations, Performer, Object

Crowdsearcher

Task Specification Task Planning Control Specification

Page 23: C ROWD S EARCHING (And Beyond)

23

Auxiliary Structures• Object : tracking object responses• Performer: tracking performer behavior (e.g. spammers)• Task: tracking task status

Crowdsearcher

Task Specification Task Planning Control Specification

Page 24: C ROWD S EARCHING (And Beyond)

24

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

Crowdsearcher

Page 25: C ROWD S EARCHING (And Beyond)

25

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

Page 26: C ROWD S EARCHING (And Beyond)

26

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

Page 27: C ROWD S EARCHING (And Beyond)

27

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)

• Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan)

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Page 28: C ROWD S EARCHING (And Beyond)

28

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

Page 29: C ROWD S EARCHING (And Beyond)

29

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

Page 30: C ROWD S EARCHING (And Beyond)

30

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

Page 31: C ROWD S EARCHING (And Beyond)

31

5. Rule Programming Best Practices• We define three classes of rules

Crowdsearcher

Page 32: C ROWD S EARCHING (And Beyond)

32

Rule Programming Best Practice

Crowdsearcher

• We define three classes of rules• Control rules: modifying the control tables;

Page 33: C ROWD S EARCHING (And Beyond)

33

Rule Programming Best Practice

Crowdsearcher

• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);

Page 34: C ROWD S EARCHING (And Beyond)

34

Rule Programming Best Practice

Crowdsearcher

• Top-to-bottom, left-to-right, evaluation• Guaranteed termination

• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);

Page 35: C ROWD S EARCHING (And Beyond)

35

Rule Programming Best Practice• We define three classes of rules

• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning

Crowdsearcher

• Termination must be proven (rule precedence graph has cycles)

Page 36: C ROWD S EARCHING (And Beyond)

36

6. Dealing with interoperability• Adaptation is any change of allocation of the application

to crowd-based systems or to their performers.• Migration is the moving of the application from a given

system to a different one. (Migration is a special case of adaptation)

• Cross-Platform Interoperability: applications change the underlying social network or crowdsourcing platforms, e.g., from Facebook to Twitter or to AMT.

• Cross-Community Interoperability: applications change the performers' community, e.g., from the students to the professors of a university.

Crowdsearcher

Page 37: C ROWD S EARCHING (And Beyond)

37

Adaptation optionsAdaptation may require: • Re-planning: the process of generating new micro-tasks. • Re-invitation: the process of generating new invitation

messages for existing or re-planned micro-tasks, with the aim of getting new performers for them.

Adaptation occurs at different levels of granularity• Task granularity: re-planning or re-invitation occurs for the

whole task• Object granularity: re-planning or re-invitation is focused on

one (or a few) objects (for instance, objects on which it is harder to achieve an agreement among performers, with a majority-based decision mechanisms).

Crowdsearcher

Page 38: C ROWD S EARCHING (And Beyond)

38

EXPERIMENTS

Crowdsearcher

Page 39: C ROWD S EARCHING (And Beyond)

39

Politician Affiliation• Given the picture and name of a politician, specify his/her political

affiliation• No time limit• Performers are encouraged to look up online

• 2 set of rules• Majority Evaluation• Spammer Detection

Crowdsearcher

Page 40: C ROWD S EARCHING (And Beyond)

40

Movie Scenes• users can select the screenshot timeframe and whether it is

a spoiler or not• 20 still images each from 16 popular movies• each micro-task consists of evaluating one image• Results are accepted, and the corresponding request is closed, when an agreement between 5 performers is reached both on the temporal category and the spoiler option, independently on the number of executions.

Crowdsearcher

Page 41: C ROWD S EARCHING (And Beyond)

41

Professors’ images• 16 professors within two

research groups in our department (DB and AI groups)

• The top 50 images returned by the Google Image API for each query

• Each microtask consisted of evaluating 5 images regarding a professor.

• Results are accepted (and thus the corresponding object is closed) when enough agreement on the class of the image is reached

• Closed objects are removed from new executions.

Crowdsearcher

Page 42: C ROWD S EARCHING (And Beyond)

42

SINGLE PLATFORM

Crowdsearcher

Page 43: C ROWD S EARCHING (And Beyond)

43

Query Type• Engagement depends on the difficulty of the task• Like vs. Add tasks:

Crowdsearcher

Page 44: C ROWD S EARCHING (And Beyond)

44

Comparison of Execution Platforms• Facebook vs. Doodle

Crowdsearcher

Page 45: C ROWD S EARCHING (And Beyond)

45

Posting Time• Facebook vs. Doodle

Crowdsearcher

Page 46: C ROWD S EARCHING (And Beyond)

46

Majority Evaluation_1/3

Crowdsearcher

30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations

Page 47: C ROWD S EARCHING (And Beyond)

47

Majority Evaluation_2/3

Crowdsearcher

Final object classification as total majority after 3 evaluationsOtherwise, re-plan of 4 additional evaluations. Then simple majority at 7

Page 48: C ROWD S EARCHING (And Beyond)

48

Majority Evaluation_3/3

Crowdsearcher

Final object classification as total majority after 3 evaluationsOtherwise, simple majority at 5 or at 7 (with replan)

Page 49: C ROWD S EARCHING (And Beyond)

49

Spammer Detection_1/2

Crowdsearcher

New rule for spammer detection without ground truthPerformer correctness on final majority. Spammer if > 50% wrong classifications

Page 50: C ROWD S EARCHING (And Beyond)

50

Spammer Detection_1/2

Crowdsearcher

New rule for spammer detection without ground truthPerformer correctness on current majority. Spammer if > 50% wrong classifications

Page 51: C ROWD S EARCHING (And Beyond)

51

MULTI-PLATFORM &MULTICOMMUNITY

Crowdsearcher

Page 52: C ROWD S EARCHING (And Beyond)

52

Number of Executions per Platform• Immediate engagement and then plateau. • Higher engagement on AMT (paid) then SN (unpaid and limited by #

of contacts of inviter)

Crowdsearcher

Page 53: C ROWD S EARCHING (And Beyond)

53

Precision of Performers per Platform

Crowdsearcher

• AMT significantly lower in precision

Page 54: C ROWD S EARCHING (And Beyond)

54

Precision on Closed Objects • Precision decreases on crowdsourcing platforms• Agreement increases precision wrt single performers

Crowdsearcher

Page 55: C ROWD S EARCHING (And Beyond)

55

Number of performers per community

Crowdsearcher

Page 56: C ROWD S EARCHING (And Beyond)

58

Precision for different engagement strategies

• Precision decreases with less expert communities• Inside-out strategy (from expert to generic users)

outperforms outside-in strategy (from generic to expert)

Crowdsearcher

Page 57: C ROWD S EARCHING (And Beyond)

59

EXPERT FINDING IN CROWDSEARCHER

Crowdsearcher

Page 58: C ROWD S EARCHING (And Beyond)

60

Problem• Ranking the members of a social group according to the level of knowledge that they have about a given topic

• Application: crowd selection (for Crowd Searching or Sourcing)

• Available data• User profile • behavioral trace that users leave behind them through

their social activities

Crowdsearcher

Page 59: C ROWD S EARCHING (And Beyond)

61

Most interesting aspect:Feature Organization Meta-Model

Crowdsearcher

Page 60: C ROWD S EARCHING (And Beyond)

62

Main Results• Profiles are less effective than level-1 resources

• Resources produced by others help in describing each individual’s expertise

• Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks• Twitter most effective in Computer Engineering, Science, Technology &

Games, Sport• Facebook effective in Locations, Sport, Movies & TV, Music• Linked-in never very helpful in locating expertise

Crowdsearcher

Page 61: C ROWD S EARCHING (And Beyond)

63

CONCLUSIONS

Crowdsearcher

Page 62: C ROWD S EARCHING (And Beyond)

64

Summary• Results

• An integrated framework for crowdsourcing task design and control• Well-structured control rules with guarantees of termination• Support for cross-platform crowd interoperability• A working prototype crowdsearcher.search-computing.org

• Forthcoming• Publication of Web Interface + API• Support of declarative options for automatic rule generation• Integration with more social networks and human computation

platforms • Providing vertical solutions for specific markets• More applications and experiments (e.g. in Expo 2015)

Crowdsearcher

Page 63: C ROWD S EARCHING (And Beyond)

65

APPENDIX

Crowdsearcher

Page 64: C ROWD S EARCHING (And Beyond)

66

Current «other» interest:Genomic Computing

• NGS changes biology & medicine Massive DNA testing becoming available DNA-based personalized medicine approaching

• NGS data management looks to me as the biggest and most important big-data problem, but:

• No high-level view of genome data supporting high-level query and search

• No scalable method for NGS data analysis

-> Data management research for NGS data has very good potential for impact!

Crowdsearcher

Page 65: C ROWD S EARCHING (And Beyond)
Page 66: C ROWD S EARCHING (And Beyond)

Single-Gene Disease Mutations

Page 67: C ROWD S EARCHING (And Beyond)

69

Our results so far:

1. Intuition: genomic data management requires a «genometric space data processing system»

2. Data model for profiling NGS data

3. System Architecture for managing genometric queries

4. Genometric Data Model (GQM) and Genometric Query Language (GQL) + mapping to PIG LATIN (Query Language for Hadoop).

Long-Term Goal: INTERNET OF GENOMES

Page 68: C ROWD S EARCHING (And Beyond)

70

QUESTIONS?

Crowdsearcher