Cooperative Patent Classification (CPC) IPC Committee of Experts
David Baehrens: Large-Scale Patent Classification at the European Patent Office
-
Upload
semantic-web-company -
Category
Data & Analytics
-
view
650 -
download
1
Transcript of David Baehrens: Large-Scale Patent Classification at the European Patent Office
![Page 1: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/1.jpg)
David Baehrens
Large-Scale Patent Classification
at the European Patent Office
![Page 2: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/2.jpg)
ABOUT AVERBIS
Founded: 2007
Location: Freiburg im Breisgau
Team: Domain & IT-Experts
Focus: Leverage structured & unstructured information
Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
![Page 3: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/3.jpg)
PORTFOLIO
Solutions
Libraries Pharma Patents Healthcare Social Media
Terminology Management Text Mining
Search & Analytics NoSQL
Categorization & Clustering
Automotive
![Page 4: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/4.jpg)
TERMINOLOGY MANAGEMENT
Terminology management
software
Provision of terminologies
Mappings between
terminologies
Building terminology-based
applications
![Page 5: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/5.jpg)
Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina
Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma…
Patterns: dates, citations, mail addresses…
Rule-based extraction of all different kinds of complex information
Persons, Locations, Genes, ….
Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type
TEXT MINING
Term Detection
Regular
Expressions
Rule Engine
Named Entities
Relations
Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding… Syntax Detection
![Page 6: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/6.jpg)
RULE ENGINE
1. NAME OF THE MEDICINAL PRODUCT
Desloratadine ratiopharm 5 mg film-coated tablets
Primary Field Name Secondary Field Name Field Value
MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets
inventedPartName DESLORATADINE
strengthPart 5 mg
pharmaceuticalDoseFormPart FILM-COATED TABLET
Te
xt
Reg
el
Erg
eb
nis
![Page 7: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/7.jpg)
SEARCH & NOSQL
Free text + concept based
search
Text mining integration
Guided navigation / facets
NoSQL functionalities
Multi- & cross lingual search
Related documents
Based on Apache Solr
• Extended Query Syntax
• JSON-API
• Scalability
…
![Page 8: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/8.jpg)
DOCUMENT CLASSIFICATION
Hotel Reviews
Patents
![Page 9: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/9.jpg)
SEARCH & NOSQL
![Page 10: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/10.jpg)
INFORMATION DISCOVERY
Terminology Management Text Mining
Search & Analytics NoSQL
Categorization & Clustering
Delivery / Deployment / Runtime Environment
Integration Tests / Continuous Integration
Extensive Documentation
Common Architecture / Application Design
User & Role Management, Security
Communication Bus
Project Management
![Page 11: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/11.jpg)
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
2) Re-Classification on
published patents, if category system changes
![Page 12: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/12.jpg)
ABOUT EPO
• The European Patent Office (EPO)
grants European patents for the
Contracting States to the European
Patent Convention
• Second largest intergovernmental
institution in Europe
• Not an EU institution
• Self-financing, i.e. revenue
from fees covers operating
and capital expenditure
![Page 13: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/13.jpg)
NUMBER OF STAFF
Status: December 2008
![Page 14: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/14.jpg)
PATENT APPLICATIONS
![Page 15: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/15.jpg)
http://www.epo.org/about-us/annual-reports-statistics/annual-report/2014.html
![Page 16: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/16.jpg)
COOPERATIVE PATENT CLASSIFICATION
• Patent Classification System based on ECLA / IPC
• jointly developed by the European Patent Office (EPO)
and the United States Patent and Trademark Office
(USPTO)
• used by both the EPO and USPTO since 1 January 2013
• currently contains about 250.000 classes
![Page 17: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/17.jpg)
EXAMPLE CPC CLASS
![Page 18: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/18.jpg)
GRANTED PATENT
![Page 19: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/19.jpg)
EARLY PATENT
![Page 20: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/20.jpg)
EARLY PATENT
![Page 21: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/21.jpg)
EARLY PATENT
![Page 22: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/22.jpg)
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
Our Motivation:
• Great Classification Use-Case
– Big Data (80 Mio. patents available)
– Large Scale Category System >250.000 CPC codes
– Tough classification quality and response time
constraints
• Text Mining Success Story
![Page 23: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/23.jpg)
OLD CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
![Page 24: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/24.jpg)
CLASSIFICATION COMPLEXITY
~250.000
CPC Codes
~1.500
Ranges
250
Departments
![Page 25: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/25.jpg)
CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
![Page 26: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/26.jpg)
NEW CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
![Page 27: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/27.jpg)
SOME FACTS
• about 650k training documents from 2005-2013
• supervised learning: light-weight and fast linear support
vector machine
• Training time (16 Cores, 128 GB RAM)
– Feature Extraction: ~1 hour
– Training of Classifiers: ~1 hour
– 90/10 tests with a look-a-head of 3 levels
and reporting 3 best candidates: ~1 hour
• Prediction: 5 docs in 5 sec
![Page 28: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/28.jpg)
HIERARCHICAL CLASSIFICATION
![Page 29: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/29.jpg)
STATUS & OUTLOOK
Range-specific quality
evaluation
Going live with best
ranges
• Continuous optimization
![Page 30: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/30.jpg)
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Re-Classification on
published patents, if category system changes
Challenges and Facts:
– 250.000 CPC codes, regular changes/refinements
– Several re-classification projects at any one time, great
variation in size, a class is split into 5-20(?) subclasses
– No training material available
![Page 31: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/31.jpg)
NEW RE-CLASSIFICATION PROCESS
Training Data
• Human Annotator starts labeling about 20% of
the documents with new subclasses
Statistical Models
• are generated on-the-fly, and
• Cross-validation test are carried out
Threshold
• If cross-validation achieves certain threshold
(e.g. 90%), the remaining documents are
classified fully automatically without further
review
• Otherwise, more training data is being generated
![Page 32: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/32.jpg)
STATUS & OUTLOOK
Currently in evaluation
phase
• Going live in the next
weeks
![Page 33: David Baehrens: Large-Scale Patent Classification at the European Patent Office](https://reader031.fdocuments.in/reader031/viewer/2022021922/587f20281a28ab350c8b6d2d/html5/thumbnails/33.jpg)
…NOT ONLY PATENTS
Solutions
Libraries Pharma Patents Healthcare Social Media
Terminology Management Text Mining
Search & Analytics NoSQL
Categorization & Clustering
Automotive