© Jouve 2013 Document strictement confidentiel 1 J-Class : an hybrid patent classification system...
-
Upload
margery-phelps -
Category
Documents
-
view
215 -
download
1
Transcript of © Jouve 2013 Document strictement confidentiel 1 J-Class : an hybrid patent classification system...
© Jouve 2013Document strictement confidentiel
1
J-Class : an hybrid patent classification system
IPC Workshop
25/02/2013
1
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 2
Plan
Jouve
J-class
Pre-processing
Similarity method
Semantic method
Combined method
Evaluations
Naïve Questions?
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 3
Jouve
The Jouve Group provides customers with cross-media solutions for designing, enriching, showcasing and distributing content. By offering innovative turnkey solutions for publishing, digitization, business process outsourcing, IT and printing, we help our customers develop flexible strategies to gain the competitive edge in the digital market.
3,000 employees25 locations, 15 in FranceAround 150,000,000 € turnover25% of sales are export
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 4
pre-processing
Classic linguistic pre-processing: phrase segmentation, tokenization, POS-tagging, lemmatization
Patent-specific pre-processing: « key-phrase » tagging
Key-phrase = part of the description that concisely describes the patent document topic
Detection of language inconsistencies => small number of documents have been ignored
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 5
Semantic method
1. Construction of semantic modelsterm extraction;semantic relation verification;different filtering methods : representative terms, polysemy reduction etcLanguage-specific methods used (linguistic pre-processing, term extraction, lexical resources)
2. Trainingannotation of patent documents with extracted terms;value calculation according to frequency and position;feed an SVM classifier.
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 6
Learning
Selected Terms
Classified Documents
InternationalPatent Classification
Terms Extraction
Semantic Network (Wordnet)
Semantic Consolidation
Relevant Terms Sub Semantic network
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 7
Run
to be classified Documents
Semantic Annotation
Annotated documents
Classifier
Classified Documents
Relevant Terms Relevant PatternsRelevant Concepts
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 8
Similarity method
Indexing – retrieval method using the Lemur System
Principle: 1. Build an index using the target data2. Query using the test data: Retrieval of the most similar patents 3. Calculate the query patent class, using the classes of the indexed documents
1 index per language; language-specific stop-words lists;
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 9
Hybrid system
Input: 3 best candidates for the 2 methods abovefrom 3 to 6 candidatesbuild classifiers on the fly, through 1 vs 1 training
Final score = sum of probability values obtained for each binary classifier
SimilaritySimilarity
SemanticSemantic
DecisionDecisionDocumentDocumentClassifiedClassifiedDocumentDocument
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 10
Fine grained Classification
Bio-technologies domain (A01H)19 classesLearning with 4005 docs, evaluation with 1251 docs.
Use of a validated terminology of the domain (INRA/MIG)System 1: Semantic Networks System 2 : Similarity System 5 : Terminology
77.91% 76.59% 51.08%
System 3: Terminology + System 1
System 4: Terminology + System 2
77.34% 76.86%
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 11
CLEF IP Results
2,7M documents used for learning stage (1,3 M patents)600 classes Learning with documents < 2002, Evaluation with documents > 2002
Nb candidates Similarity Method (1)Semantic and statistic Method
Combined Method (3)
Delta between (1) and (3)
First candidate 77,5% 75,15% 82,1% +4,6
Two candidates 86,6% 86,7% 92,05% +5,45
3 candidates 91,15% 91,05% 95,35% +4,2
4 candidates 93,5% 93,65% 96,6% +3,1
5 candidates 94,8% 94,7% 97% +2,2
6 candidates 95,8 % 95,25 % 97,05 % +1,25
10 candidates 97,05% 97,25%
20 candidates 98,4% 98,55%
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 12
Pre-classification results
4.5 M patents used for learning stage100 classesEvaluation with 130 000 patents
Method PrecisionSemantic Networks (233 000 terms)Semantic Networks (233 000 terms) 85,4 %85,4 %
Similarity MethodSimilarity Method 81,8600 %
Patent Office Staff 80 %
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 13
Naïve Questions
How can we improve our system performance ?more patents -> better results
What does it mean 80% for patent office staff ?
What is the inter annotator agreement between examiners ?
What is the best achievable performance for an automatic classification system ?
it is not 100%, that is for sure
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 14
Suspicious references
US2007215593 : Diaper rash prevention apparatus
IPC : A21B1/00 (Bakers’ ovens)ECLA : A47K11/02 (Sanitary equipment)
US6459426B1 : Monolithic integrated circuit implemented in a digital display unit for generating digital data elements from an analog display signal received at high frequencies. “The present invention relates to digital display units used in computer systems”
IPC : B60R25/10F; B62H5/00; B62H5/20 (Vehicles; Cycles)
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 15
Non mutually exclusive classes
ECLA : E05D13/04 -> IPC class E05C17/60but class ECLA E05C17/60 also exist !!!
ECLA : E05D13/04 Fasteners specially adapted for holding sliding wings open
E05D13/06 with notches E05D13/06 acting by friction
ECLA : E05C17/60 holding sliding wings open E05C17/62 using notches E05C17/64 by friction
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 16
Some confusions
US2006140886 : Tanning Aids - Claim 1. A tanning aid comprising a polymethyl methacrylate shaped body which comprises 0.1 to 1.5% …
EPO class : A61K8 Cosmetic preparationsJ-Class : C08F Macromolecular compounds…
US2007282244 - Glaucoma Implant with Anchor - Claim 1. A method for reducing intraocular pressure…
EPO class : A61M27 Implants devices for drainage of body fluids from one part of the body to the other (intraocular A61F9/00)J-Class : A61F9 Method or devices for treatment of the eyes
A61F9/007V Apparatus for modifying intraocular pressure, e.g. for glaucoma treatment
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 17
Some confusions
GB1077771 : Improvements in or relating to hot water storage containers - Claim 1. A hot water storage container of double walled construction, the walls being 70 spaced apart to provide a cavity to receive heat insulation material…
EPO class : E03B11 Arrangements or adaptations of tanks for water supplyJ-Class : F24D Domestic hot-water supply systems; Elements or Components therefor
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 18
True confusions
EPO : F24H1/18 water-storage heatersJ-Class : F17C Liquified gaz containers
© Jouve 2013Document strictement confidentiel
Jean-Pierre Raysz26/02/2013 19
True confusion
EPO : B62K25 Axle suspensionJ-Class : B61F RAIL Vehicle suspension
© Jouve 2013Document strictement confidentiel
FRANCEParisJouve
11, Boulevard de SébastopolCS 70004
75036 Paris cedex 01Tél. : +33 (0) 1.44.76.54.40Fax : +33 (0) 1.44.76.86.39
Concevons ensemblel’avenir de vos contenus
Concevons ensemblel’avenir de vos contenus
Concevons ensemblel’avenir de vos contenus