Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk...
Transcript of Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk...
![Page 1: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/1.jpg)
Using Amazon Mechanical Turk for DataCollection in Parts of Speech Tag Correction for
Patent Claims
David Cinciruk
June 24, 2015
1
![Page 2: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/2.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
2
![Page 3: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/3.jpg)
Our Research
I Work is in Patent ProcessingI Mapping patents to other technical documentsI Automated classificationI Patent RetrievalI Patent Valuation
I Need to find a way to represent patents. NLP offers a solutionwith dependencies
3
![Page 4: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/4.jpg)
Dependency Modeling
I Dependencies help to represent words by their relationships
I Formed by traversing the parse tree of a sentence or segmentand noting two words and how they are linked together
4
![Page 5: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/5.jpg)
NLP Parsers Do Not Work Well With Patent Claims!
5
![Page 6: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/6.jpg)
Obvious Mislabeling of Words
6
![Page 7: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/7.jpg)
Odd Language of Patent Claims
The holder for use with a razor blade according to claim 1, saidrecess being semicircular in shape and revealing opposite faces ofthe razor blade which may be grasped with the fingers of the user.
I Patents are the intersection of technical and legal speech
I Legally each patent claim must be one very long run-onsentence
I Features particular language whose meanings aren’t standardmeanings in normal English.
I Because of these, NLP software struggles to correctly tagpatent claims
7
![Page 8: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/8.jpg)
Trying to Correct the NLP
blade: VBP → NNsaid: VBD → JJsemicircular: VBN → JJ
I By forcing incorrect tags to a corrected tag, NLP softwareswould be forced to reconstruct the parse tree and createdependencies more similar to true speech
I We need a large collection of hand-corrected patent claims inorder to train a system to automatically correct patents
I Amazon Mechanical Turk allows us to gather this collectionvia crowdsourcing
8
![Page 9: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/9.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
9
![Page 10: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/10.jpg)
Artificial Artificial Intelligence
Crowdsourcing Internet Marketplace that enables individuals andorganizations to coordinate the use of human intelligence toperform tasks that computers are unable to do.
10
![Page 11: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/11.jpg)
The Turk
Origin comes from a supposed master Chess Playing Automatonbuilt in the 1770 by Wolfgang von Kempelen
11
![Page 12: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/12.jpg)
The Turk
Actually was a chess master hidden in the cabinet playing thegame from underneath
12
![Page 13: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/13.jpg)
Workers and Requesters
I Worker
I Also known as Turkers
I Performs tasks set up byrequesters
I Work their own hours andchoose their own tasks
I Requester
I Creates tasks for workers todo
I Can set qualifications ontasks to dislow unprovenpeople from performingtasks.
I Can set up tests todetermine if people arequalified to work onparticular HITs or not
13
![Page 14: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/14.jpg)
Our Use of Mechanical Turk
I We want to develop the most accurate dependencies for use inour future systems
I We have patent claims automatically segmented based off ofsemicolons and colons and POS tagged
I Tags may be incorrect - want to eventually be able toautomatically correct the tags
I Need to gather lists of corrected tags
I Decided to use Mechanical Turk to gather corrected tags
14
![Page 15: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/15.jpg)
Our Use of Mechanical Turk
I Want to create an easy to perform HIT that has high impacton our tagging.
I Most common problem that has a high impact on thedependencies are problems with words incorrectly tagged asverbs
I Chose the task of checking if words initially tagged as verbsare nouns or adjectives
I Turkers will label the verbs as noun, adjective, or verbs andassume all other tags are correct
15
![Page 16: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/16.jpg)
The Setup
I Initial:I The/DT holder/NN for/IN use/NN with/IN a/DT razor/NN
blade/VBP according/VBG to/TO claim/NN 1/CD ,/,said/VBD recess/NN being/VBG semicircular/VBN in/INshape/NN and/CC revealing/VBG opposite/JJ faces/NNSof/IN the/DT razor/NN blade/NN which/WDT may/MDbe/VB grasped/VBN with/IN the/DT fingers/NNS of/INthe/DT user/NN ./.
I Corrected:I The/DT holder/NN for/IN use/NN with/IN a/DT razor/NN
blade/NN according/VBG to/TO claim/NN 1/CD ,/,said/JJ recess/NN being/VBG semicircular/JJ in/INshape/NN and/CC revealing/VBG opposite/JJ faces/NNSof/IN the/DT razor/NN blade/NN which/WDT may/MDbe/VB grasped/VBN with/IN the/DT fingers/NNS of/INthe/DT user/NN ./.
16
![Page 17: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/17.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
17
![Page 18: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/18.jpg)
Worker Homepage
18
![Page 19: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/19.jpg)
HIT Search
19
![Page 20: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/20.jpg)
HIT Search
20
![Page 21: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/21.jpg)
Our HIT
21
![Page 22: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/22.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
22
![Page 23: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/23.jpg)
Requester Homepage
23
![Page 24: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/24.jpg)
Tasks for Requesters
I Editing HITs
I Publishing Batches
I Reviewing Batches
24
![Page 25: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/25.jpg)
Editing HITs
25
![Page 26: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/26.jpg)
Editing HITs - Describing the HIT
26
![Page 27: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/27.jpg)
Editing HITs - Setting Up Properties
27
![Page 28: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/28.jpg)
Editing HITs - Setting Worker Requirements
28
![Page 29: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/29.jpg)
Editing HITs - Designing the HIT
29
![Page 30: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/30.jpg)
Editing HITs - Previewing the HIT
30
![Page 31: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/31.jpg)
Tasks for Requesters
I Editing HITs
I Publishing Batches
I Reviewing Batches
31
![Page 32: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/32.jpg)
Publishing HITs
32
![Page 33: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/33.jpg)
Publishing HITs - Developing the Initial CSV file
I Takes a folder with initialPOS tagged patent claimsthat were not checked and afolder featuring correctlylabeled claim segments
I Makes a CSV file featuringone segment (featuringverbs) from the initialpatent claim files and one ofthe segments with analready correct segment
I Also outputs a file featuringall the correct segments
I From there we can extend itto a 10 segment version byhand
33
![Page 34: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/34.jpg)
Publishing HITs - Developing a Larger CSV file
I VBA code that takes a 10segment version and makesa 20 segment version
I Interleaves the segments ofeach HIT (including the testsegments)
I Randomizes the locations ofthe two test segments andsaves their locations as ahidden variable
34
![Page 35: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/35.jpg)
Publishing HITs - Uploading the CSV file
35
![Page 36: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/36.jpg)
Publishing HITs - Previewing the HITs
36
![Page 37: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/37.jpg)
Publishing HITs - Previewing the HITs
37
![Page 38: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/38.jpg)
Publishing HITs - Confirming the Batch
38
![Page 39: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/39.jpg)
Tasks for Requesters
I Editing HITs
I Publishing Batches
I Reviewing Batches
39
![Page 40: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/40.jpg)
Reviewing Batches
40
![Page 41: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/41.jpg)
Reviewing Batches - Summary of Results
41
![Page 42: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/42.jpg)
Reviewing Batches - Reviewing Results
42
![Page 43: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/43.jpg)
Reviewing Batches - Reviewing Results
43
![Page 44: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/44.jpg)
Reviewing Batches - Reviewing Results
44
![Page 45: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/45.jpg)
Reviewing Batches - CSV File
45
![Page 46: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/46.jpg)
Reviewing Batches - CSV File
46
![Page 47: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/47.jpg)
Reviewing Batches - Checking Answers
I Need to approve or reject HITs since answers may or may notbe correct.
I May opt to give the same HIT to multiple Workers and thenpay the majority answer
I Better for subjective HITsI More Expensive since need to pay for multiple “correct” HITs
I May instead put test questions inside HITs that have knownanswers and check the answers
I Better for objective HITs where accuracy is keyI Method we used for our HITs
47
![Page 48: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/48.jpg)
Reviewing Batches - Automatic Matlab HIT Checker
Load CSV File and Solutions
Determine Location of Test
Questions
Remove non-alphabetical
symbols from CSV file and solutions
Compare the test questions for a
match in the solutions
Write whether to accept or reject in
the CSV File
48
![Page 49: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/49.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
49
![Page 50: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/50.jpg)
Mechanical Turk Sandbox
I You may not know how to develop your HIT or how yourresults will look.
I Sandbox mode allows Requesters to develop HITs withouthaving to pay anyone.
I Requesters can make Sandbox Worker accounts to performtheir own HITs to test out answering the HITs.
I Layout for Workers and Requesters are same as regular Turk
50
![Page 51: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/51.jpg)
Like Monopoly MoneyYou have fake money on the Sandbox that you can give out toSandbox Workers
51
![Page 52: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/52.jpg)
Worker and Requester Sandbox
52
![Page 53: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/53.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
53
![Page 54: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/54.jpg)
Fishing for Results
Workers need to be enticed to do HITs. Many potential ways to dothat. Need to weigh one lure against another
54
![Page 55: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/55.jpg)
Our Experiences
Initial unsucessful HIT paid 5 cents for 2 questions - after 19 daysonly 86 HITs completed
Secondary successful HIT paid 45 cents for 10 questions - after 27days 657 HITs completed
Very successful third HIT paid $1.20 for 20 questions - after 11days 878 HITs completed
55
![Page 56: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/56.jpg)
Our Experiences
I Higher Paying HITs entice more people
I More available HITs entice more people
I Reducing Qualifications allow more people to participate
56
![Page 57: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/57.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
57
![Page 58: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/58.jpg)
The Next Step
I Next Step after gathering all the data is to run a system toautomatically correct Patent Claim POS tags.
I We needed Amazon Mechancial Turk to gather corrected tagsfrom patent segments to serve as ground truth for developingour system
58
![Page 59: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/59.jpg)
Training Stage
Original POS Tagging
Corrected POS Tagging
Rule-based Changing
Feature Extraction
SVM Training
Sorting Features into Classes
59
![Page 60: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/60.jpg)
Rule-Based Corrector
Original POS Tagging
Corrected POS Tagging
Rule-based Changing
Feature Extraction
SVM Training
Sorting Features into Classes
60
![Page 61: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/61.jpg)
Rule-Based Corrector
I Some words are almost always mislabeled as verbs and canthus be automatically corrected from the very beginning via arule-like system
I List is ever expanding when new words that are always nounsor adjectives are discovered
I Will also force words in the list that have been correctlytagged to keep their tags
61
![Page 62: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/62.jpg)
Rule-Based Corrector Examples
I said → JJ (“said recess being semicircular”)I claim → NN (“as recited in claim 5”)
I sole exception is at the start of a patent (“In this patent, weclaim”)
I means → NN (“including means for continuously conveying”)
I wherein → WRB (“The system of claim 13 wherein”)
I nitride → NN (“silver nitride”)
I boride → NN (“cobalt boride”)
62
![Page 63: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/63.jpg)
Gathering Trigrams and Dependencies
Original POS Tagging
Corrected POS Tagging
Rule-based Changing
Feature Extraction
SVM Training
Sorting Features into Classes
63
![Page 64: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/64.jpg)
Gathering Trigrams and Dependencies
I Developed Matlab code to gather all the “verbs” and itstrigrams and dependencies and filter them into whether theyare actually an adjective, noun, or still a verb.
I Each trigram and set of dependencies focus on just one“verb”. The other words are represented just by their POStags
I Question lies on how to represent the “verb” while preventingoverfitting and allowing for similar words to be groupedtogether
64
![Page 65: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/65.jpg)
Word Vectors
I Representation ofwords as a vector
I Trained on an inputdatabase (for us,patent claims)
I Preserves similarrelationships betweensimilar groups ofwords
65
![Page 66: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/66.jpg)
Testing Stage
Original POS Tagging
Rule-based Changing
SVM Decision Boundary
Feature Extraction
SVM Test
New Labels for Segments
Repeat
66
![Page 67: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/67.jpg)
Multiple Rounds of Testing?
67
![Page 68: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/68.jpg)
After 4 Iterations: A Truly Corrected Parse Tree
68
![Page 69: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/69.jpg)
Table of Contents
Overview of Our Problem
Using Amazon Mechanical TurkWorker ViewsRequester ViewsTesting HITsDifficulties of Obtaining Results
Automatic Correcter of Patent Claim POS Tags
Conclusion
69
![Page 70: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/70.jpg)
Conclusion
I Patent Claims, due to their structure, are notorously difficultfor a computer to properly parse
I An automated system can be made to correct incorrect POStags in order to be used in more advanced patent processingsystems
I A large hand-corrected dataset needs to be obtained in orderto learn how to properly change POS tags.
I Amazon Mechanical Turk provides the tools needed tocrowdsource this hand labeling.
70
![Page 71: Using Amazon Mechanical Turk for Data Collection in · PDF fileUsing Amazon Mechanical Turk for Data Collection in Parts of Speech Tag Correction for Patent Claims David Cinciruk June](https://reader033.fdocuments.in/reader033/viewer/2022051723/5ab006c67f8b9a59478e189e/html5/thumbnails/71.jpg)
Further Work
I Taking the approved Amazon Mechanical Turk results,determining how correct they actually are and make a largedatabase of the original tagging and the corrected tagging.
I Learning how to make a corpus for use in word2vec and thenrunning word2vec on an all patent claim database.
I Putting together all the pieces of codes already developed fortraining and testing our automatic corrector and beginexperimenting with it
71