Inventor Disambiguation Workshop EVALUATION OUTCOMES.

Inventor Disambiguation Workshop

EVALUATION OUTCOMES

7 Participant Teams• U Mass Amherst IESL (USA)• Centre for European Economic Research (Mannheim, DE)• KU Leuven (Belgium)• Penn State University (USA)• InnovationPulse (USA)• Centre for Transformative Innovation (CTI) at Swinburne

University of Technology (Australia)• Institute of Scientific and Technical Information of China

(China)

Timeline• June 2 – Training datasets posted online• July 15 – ‘Intent to Participate’ deadline• August 30 – Initial submission deadline

– Output dataset + documentation + runtime

• September 1-4 – Phase 1 evaluation• September 5 - Notification of progression to Phase 2• September 18 – Phase 2 submission

– 2 output datasets + final documentation + AWS runtime

• September 18 – Phase 2 evaluation• September 21 – Judges identify successful team

Training/test datasets• Labeled datasets: human-validated inventor clusters (inventors

a, b, c… on patents d, e, f… are definitely the same people)• 5 datasets generously provided by:

– Pierre Azoulay: de-identified academic life sciences (Azoulay 2007 and 2012)

– Ivan Png LinkedIN inventors (Chunmiam et al., 2015) – Erica Fuchs, Sam Ventura: de-identified optoelectronics patents

(Akinsanmi et al., 2014)– Manuel Trajtenburg: Israeli inventors (Trajtenberg and Shiff, 2008)– Francesco Lissoni, University of Bordeaux, EPO benchmark datasets

* Documentation available at www.dev.patentsview.org/workshop

Dataset Patent-Inventor Records

Unique Inventors

Reference

OE 98,762 824 Akinsanmi et al, 2014

ALS 42,376 4,801 Azoulay, 2007; Azoulay, 2012

IS 9,156 3,845 Trajtenberg and Shiff, 2008

E&S 96,104 14,293 Chunmian et al., 2015

EPO 1,922; 1,088 424; 312 Lissoni et al., 2010

Documentation available at http://www.dev.patentsview.org/workshop/data/README.pdf

Phase 1 Evaluation• 5 teams successfully submitted a full output dataset of inventor clusters• Output datasets were evaluated using 4 withheld labeled datasets:

– Azoulay (full dataset, as it was deidentified in the training version)– Png (~20% of the data)– Trajtenberg (~20% of the data)– Azoulay – common names (a subset of the larger dataset containing only common

names)

• Metrics calculated:– Precision, lumping, recall, splitting, F1 score

• Algorithm runtime also reported but with different environments, this is not a valid metric

• Judges reviewed the results and identified 3 teams to proceed to Phase 2 evaluation

Team Test Data Precision Recall F1 Average F1CEER als 0.999401347 0.891268353 0.942242627CEER ens 1 0.778727154 0.875600456CEER is 0.996245978 0.922224061 0.957806995CEER als_common 0.999989575 0.77409957 0.872663589 0.912078417Innovation Pulse als 0.991784517 0.655106334 0.789031427Innovation Pulse ens 0.997609642 0.658972609 0.793679189Innovation Pulse is 0.998310811 0.637434664 0.778064712Innovation Pulse als_common 0.99363578 0.638165016 0.777182601 0.784489482ISTIC als 0.996649248 0.954090692 0.974905728ISTIC ens 0.99947885 0.921752133 0.959043207ISTIC is 0.996926117 0.751365216 0.856900212ISTIC als_common 0.984901199 0.934397521 0.958984893 0.93745851PSU als 0.999297 0.642464098 0.782102155PSU ens 1 0.661001562 0.795907213PSU is 0.999578059 0.588035744 0.740466764PSU als_common 0.999986296 0.588888979 0.741255047 0.764932795UMass als 0.99888066 0.976346914 0.987485253UMass ens 1 0.966156229 0.982786835UMass is 0.998875335 0.955320205 0.976612392UMass als_common 0.996885177 0.963393671 0.979853322 0.98168445Current PatentsView Fleming/Li als 0.999043089 0.885710148 0.938969182Fleming/Li ens 1 0.812357315 0.896464851Fleming/Li is 0.998781859 0.881929505 0.936725547Fleming/Li als_common 0.998039234 0.883168029 0.93709647 0.927314013

Phase 2 Evaluation• 3 teams progressed to Phase 2:

– CEER, UMass Amherst, ISTIC/China– CEER withdrew

• Set up identical AWS evaluation environments• 2 training datasets provided

– Random sample from all Phase 1 evaluation datasets– Subset of Phase 1 evaluation datasets with similar characteristics to full

USPTO dataset• Inventors/patents, % missing assignees,

• Algorithms trained and run on full USPTO dataset (twice)• Algorithm documentation provided (user manual and manuscript)• Output datasets evaluated with Phase 1 metrics• Judges’ final determination of successful algorithm based on:

– F1 score– Run time– Usability (based on documentation)

Team A Team B FLEMING/Li

Results when trained on random mixture dataset

Results when trained on common characteristics dataset



No training

Precision 0.999709 0.999719 0.998488 0.991932 0.999941418

Splitting 0.033936 0.033358 0.116845 0.103866 0.184882461

Recall 0.966064 0.966642 0.883155 0.896134 0.815117539

Lumping 0.000281 0.000271 0.001337 0.007289 4.78E-05

F score 0.982599 0.982903 0.937287 0.941603 0.898119352

True Positives 384367 384597 351380 356544 324310

False Negatives 13502 13272 46489 41325 73559

False Positives 112 108 532 2900 19

Runtime7 hours on c3.8xlarge AWS instance 7 hours on c3.8xlarge AWS instance N/A

(CPU usage topped at 69%) (CPU usage topped at 11.85%)

test data precision recall Fscore

Team B


eval_als 0.998171124 0.949989322 0.973484408

eval_als_common 0.993201838 0.920831551 0.955648522

eval_ens 0.999463175 0.894823972 0.944253473

eval_is 0.995252994 0.763279828 0.863966284


eval_als 0.996064686 0.961459702 0.978456322

eval_als_common 0.982322588 0.963280689 0.972708456

eval_ens 0.998730234 0.903074643 0.948496831

eval_is 0.995673975 0.837911633 0.910005841

Team A


eval_als 0.998816547 0.975242586 0.986888808

eval_als_common 0.996589319 0.959737881 0.977816513

eval_ens 1 0.968265624 0.983876985

eval_is 0.998879793 0.959126262 0.978599468


eval_als 0.998895036 0.977163092 0.987909564

eval_als_common 0.997035976 0.966411918 0.981485124

eval_ens 1 0.967544691 0.983504665

eval_is 0.998879117 0.958547079 0.978297585

Inventor Disambiguation Workshop EVALUATION OUTCOMES.

Documents

Transcript of Inventor Disambiguation Workshop EVALUATION OUTCOMES.