Data annotation with Amazon Mechanical Turk. Alexander Sorokin David Forsyth University of Illinois...

download Data annotation with Amazon Mechanical Turk. Alexander Sorokin David Forsyth University of Illinois at Urbana-Champaign

If you can't read please download the document

description

Task Amazon Mechanical Turk Is this a dog? o Yes o No Workers Answer: Yes Task: Dog? Pay: $0.01 Broker $0.01

Transcript of Data annotation with Amazon Mechanical Turk. Alexander Sorokin David Forsyth University of Illinois...

Data annotation with Amazon Mechanical Turk. Alexander Sorokin David Forsyth University of Illinois at Urbana-ChampaignX = $5000 Motivation Unlabeled data is free (47M creative commons-licensed images at Flickr) Labels are useful We need large volumes of labeled data Different labeling needs: Is there X in the image? Outline X. Where is part Y of X. Of these 500 images, which belong to category X? . and many more . Task Amazon Mechanical Turk Is this a dog? o Yes o No Workers Answer: Yes Task: Dog? Pay: $0.01 Broker$0.01 Task Amazon Mechanical Turk Is this a dog? o Yes o No Workers Answer: Yes Task: Dog? Pay: $0.01 Broker$0.01 x = $1 000 Annotation protocols Type keywords Select relevant images Click on landmarks Outline something Detect features .. anything else Type keywords$0.01 Select examples Joint work with Tamara and Alex Berg Select examples requester mtlabel $0.02 Click on landmarks $0.01 Outline something $0.01Data from Ramanan NIPS06 Detect featuresMeasuring molecules. Joint work with Rebecca Schulman (Caltech) ?? $0.1 Ideal task properties Easy cognitive task Good: Where is the car? (bounding box) Good: How many cars are there? (3) Bad: How many cars are there? (132) Well-defined task Good: Locate corners of the eyes. Bad: Label joint locations. (low resolution or close-up images) Concise definition Good: 1-2 paragraphs, fixed for all tasks Good: 1-2 unique sentences per task. Bad: 300 pages annotation manual Low amount of input Good: few clicks or a couple words Bad: detailed outlines of all objects (100s of control points) Ideal task properties High volume Good: 2-100K tasks Bad: 1000 images per day