C ROWD S EARCHING (And Beyond)

Post on 24-Feb-2016

38 views 0 download

description

C ROWD S EARCHING (And Beyond). Stefano Ceri Politecnico di Milano Dipartimento di Elettronica , Informazione e BioIngegneria. Crowd-based Applications. Emerging crowd-based applications : opinion mining localized information gathering marketing campaigns - PowerPoint PPT Presentation

Transcript of C ROWD S EARCHING (And Beyond)

1

CROWDSEARCHING(AND BEYOND)

Stefano CeriPolitecnico di Milano

Dipartimento di Elettronica, Informazione e BioIngegneria

Crowdsearcher

2

Crowd-based Applications• Emerging crowd-based applications:

• opinion mining • localized information gathering • marketing campaigns • expert response gathering

• General structure: • the requestor poses some questions • a wide set of responders are in charge of providing answers

(typically unknown to the requestor)• the system organizes a response collection campaign

• Include crowdsourcing and crowdsearching

Crowdsearcher

3

The “system” is a wide concept • Crowd-based applications may use social networks and Q&A

websites in addition to crowdsourcing platforms• Our approach: a coordination engine which keeps an overall

control on the application deployment and execution

Crowdsearcher

CrowdSearcher

AP

I Access

4

A simple example of crowdsearching

Crowdsearcher

5

Example: Find your job (social invitation)

Crowdsearcher

6

Example: Find your job (social invitation)

Selected data items can be transferred to the crowd question

Crowdsearcher

7

Find your job (response submission)

Crowdsearcher

8

Crowdsearcher results (in the loop)Crowdsearcher

9

Deployment alternatives • Multi-platform deployment

Embedded application

Social/ Crowd platformNative

behaviours

External application

Standalone application

API

Embedding

Community / Crowd

Generated query template

Native

Crowdsearcher

10

Deployment: search on a social network• Multi-platform deployment

Crowdsearcher

11

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

12

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

13

Deployment: search on the social network• Multi-platform deployment

Crowdsearcher

14

THE MODEL ANDTHE PROCESS

Crowdsearcher

15

CrowdSearcher• Combines a conceptual framework, a specification

paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring

applications on top of crowd-based systems• Design is top-down, platform-independent• Deployment turns declarative specifications into platform-specific

implementations which include social networks and crowdsourcing platforms

• Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability

• Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013)

Crowdsearcher

16

• A simple task design and deployment process, based on specific data structures• created using model-driven transformations• driven by the task specification

The Design Process

Task Specification Task Planning Control

Specification

Crowdsearcher

• Task Specification: task operations, objects, and performers• Task Planning: work distribution• Control Specification: task control policies

17

DEMO !

Crowdsearcher

18

Valuable ideas: 1. Operation types• In a Task, performers are required to execute logical operations on input objects

• e.g. Locate the faces of the people appearing in the following 5 images

• CrowdSearcher offers pre-defined operation types:• Like: Ask a performer to express a preference (true/false)

• e.g. Do you like this picture?• Comment: Ask a performer to write a description / summary / evaluation

• e.g. Can you summarize the following text using your own words?• Tag: Ask a performer to annotate an object with a set of tags

• e.g. How would you label the following image?• Classify: Ask a performer to classify an object within a closed-set of alternatives

• e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema

• e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano?• Modify: Ask a performer to verify/modify the content of one or more input object

• e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects

• e.g. Order the following books according to your taste

Crowdsearcher

19

2. Platform-independent Meta-Model

Crowdsearcher

20

3. Reactive Crowdsourcing• A conceptual framework for controlling the execution of

crowd-based computations. Based on: • Control Marts• Active Rules

• Classical forms of controls:• Majority control (to close object computations)• Quality control (to check that quality constraints are met)• Spam detection (to detect / eliminate some performers)• Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers)

Crowdsearcher

21

Why Active Rules?• Ease of Use: control is easily expressible

• Simple formalism, simple computation• Power: arbitrarily complex controls is supported

• Extensibility mechanisms• Automation: active rules can be system-generated

• Well-defined semantics• Flexibility: localized impact of changes on the rules set

• Control isolation• Known formal properties descending from known theory

• Termination, confluence

Crowdsearcher

22

4. Control Mart• Data structure for controlling application execution, inspired by data

marts (for data warehousing); content is automatically built from task specification & planning

• Central entity: MicroTask Object Execution

• Dimensions: Task / Operations, Performer, Object

Crowdsearcher

Task Specification Task Planning Control Specification

23

Auxiliary Structures• Object : tracking object responses• Performer: tracking performer behavior (e.g. spammers)• Task: tracking task status

Crowdsearcher

Task Specification Task Planning Control Specification

24

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

Crowdsearcher

25

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

26

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

27

Active Rules Language• Active rules are expressed on the previous data structures• Event-Condition-Action paradigm

• Events: data updates / timer• ROW-level granularity

• OLD before state of a row• NEW after state of a row

• Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes)

• Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan)

Crowdsearcher

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

28

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

29

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

30

e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

c: NEW.ClassifiedParty == ’Republican’

a: SET ObjectControl[oID == NEW.oID].#Eval+= 1

Crowdsearcher

Rule Example 1

31

5. Rule Programming Best Practices• We define three classes of rules

Crowdsearcher

32

Rule Programming Best Practice

Crowdsearcher

• We define three classes of rules• Control rules: modifying the control tables;

33

Rule Programming Best Practice

Crowdsearcher

• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);

34

Rule Programming Best Practice

Crowdsearcher

• Top-to-bottom, left-to-right, evaluation• Guaranteed termination

• We define three classes of rules• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);

35

Rule Programming Best Practice• We define three classes of rules

• Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning

Crowdsearcher

• Termination must be proven (rule precedence graph has cycles)

36

6. Dealing with interoperability• Adaptation is any change of allocation of the application

to crowd-based systems or to their performers.• Migration is the moving of the application from a given

system to a different one. (Migration is a special case of adaptation)

• Cross-Platform Interoperability: applications change the underlying social network or crowdsourcing platforms, e.g., from Facebook to Twitter or to AMT.

• Cross-Community Interoperability: applications change the performers' community, e.g., from the students to the professors of a university.

Crowdsearcher

37

Adaptation optionsAdaptation may require: • Re-planning: the process of generating new micro-tasks. • Re-invitation: the process of generating new invitation

messages for existing or re-planned micro-tasks, with the aim of getting new performers for them.

Adaptation occurs at different levels of granularity• Task granularity: re-planning or re-invitation occurs for the

whole task• Object granularity: re-planning or re-invitation is focused on

one (or a few) objects (for instance, objects on which it is harder to achieve an agreement among performers, with a majority-based decision mechanisms).

Crowdsearcher

38

EXPERIMENTS

Crowdsearcher

39

Politician Affiliation• Given the picture and name of a politician, specify his/her political

affiliation• No time limit• Performers are encouraged to look up online

• 2 set of rules• Majority Evaluation• Spammer Detection

Crowdsearcher

40

Movie Scenes• users can select the screenshot timeframe and whether it is

a spoiler or not• 20 still images each from 16 popular movies• each micro-task consists of evaluating one image• Results are accepted, and the corresponding request is closed, when an agreement between 5 performers is reached both on the temporal category and the spoiler option, independently on the number of executions.

Crowdsearcher

41

Professors’ images• 16 professors within two

research groups in our department (DB and AI groups)

• The top 50 images returned by the Google Image API for each query

• Each microtask consisted of evaluating 5 images regarding a professor.

• Results are accepted (and thus the corresponding object is closed) when enough agreement on the class of the image is reached

• Closed objects are removed from new executions.

Crowdsearcher

42

SINGLE PLATFORM

Crowdsearcher

43

Query Type• Engagement depends on the difficulty of the task• Like vs. Add tasks:

Crowdsearcher

44

Comparison of Execution Platforms• Facebook vs. Doodle

Crowdsearcher

45

Posting Time• Facebook vs. Doodle

Crowdsearcher

46

Majority Evaluation_1/3

Crowdsearcher

30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations

47

Majority Evaluation_2/3

Crowdsearcher

Final object classification as total majority after 3 evaluationsOtherwise, re-plan of 4 additional evaluations. Then simple majority at 7

48

Majority Evaluation_3/3

Crowdsearcher

Final object classification as total majority after 3 evaluationsOtherwise, simple majority at 5 or at 7 (with replan)

49

Spammer Detection_1/2

Crowdsearcher

New rule for spammer detection without ground truthPerformer correctness on final majority. Spammer if > 50% wrong classifications

50

Spammer Detection_1/2

Crowdsearcher

New rule for spammer detection without ground truthPerformer correctness on current majority. Spammer if > 50% wrong classifications

51

MULTI-PLATFORM &MULTICOMMUNITY

Crowdsearcher

52

Number of Executions per Platform• Immediate engagement and then plateau. • Higher engagement on AMT (paid) then SN (unpaid and limited by #

of contacts of inviter)

Crowdsearcher

53

Precision of Performers per Platform

Crowdsearcher

• AMT significantly lower in precision

54

Precision on Closed Objects • Precision decreases on crowdsourcing platforms• Agreement increases precision wrt single performers

Crowdsearcher

55

Number of performers per community

Crowdsearcher

58

Precision for different engagement strategies

• Precision decreases with less expert communities• Inside-out strategy (from expert to generic users)

outperforms outside-in strategy (from generic to expert)

Crowdsearcher

59

EXPERT FINDING IN CROWDSEARCHER

Crowdsearcher

60

Problem• Ranking the members of a social group according to the level of knowledge that they have about a given topic

• Application: crowd selection (for Crowd Searching or Sourcing)

• Available data• User profile • behavioral trace that users leave behind them through

their social activities

Crowdsearcher

61

Most interesting aspect:Feature Organization Meta-Model

Crowdsearcher

62

Main Results• Profiles are less effective than level-1 resources

• Resources produced by others help in describing each individual’s expertise

• Twitter is the most effective social network for expertise matching – sometimes it outperforms the other social networks• Twitter most effective in Computer Engineering, Science, Technology &

Games, Sport• Facebook effective in Locations, Sport, Movies & TV, Music• Linked-in never very helpful in locating expertise

Crowdsearcher

63

CONCLUSIONS

Crowdsearcher

64

Summary• Results

• An integrated framework for crowdsourcing task design and control• Well-structured control rules with guarantees of termination• Support for cross-platform crowd interoperability• A working prototype crowdsearcher.search-computing.org

• Forthcoming• Publication of Web Interface + API• Support of declarative options for automatic rule generation• Integration with more social networks and human computation

platforms • Providing vertical solutions for specific markets• More applications and experiments (e.g. in Expo 2015)

Crowdsearcher

65

APPENDIX

Crowdsearcher

66

Current «other» interest:Genomic Computing

• NGS changes biology & medicine Massive DNA testing becoming available DNA-based personalized medicine approaching

• NGS data management looks to me as the biggest and most important big-data problem, but:

• No high-level view of genome data supporting high-level query and search

• No scalable method for NGS data analysis

-> Data management research for NGS data has very good potential for impact!

Crowdsearcher

Single-Gene Disease Mutations

69

Our results so far:

1. Intuition: genomic data management requires a «genometric space data processing system»

2. Data model for profiling NGS data

3. System Architecture for managing genometric queries

4. Genometric Data Model (GQM) and Genometric Query Language (GQL) + mapping to PIG LATIN (Query Language for Hadoop).

Long-Term Goal: INTERNET OF GENOMES

70

QUESTIONS?

Crowdsearcher