Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework •...
Transcript of Invention Information Retrieval and Visualization€¦ · MongoDB Django Web Framework •...
Invention Information Retrieval and Visualization
Contents:1. Introduction2. Background3. IR Framework4. Visualization Framework5. Conclusion
Honggu Lin(u6135394)
User:• Person
Query:• Keywords• short
Goal of search:• Precision-Oriented• Few top relevant
document are sufficient
User:• Patent
analysts
Query:• Patent document• Long
Goal of search:• Recall-Oriented• Top 100-200
documents are examined
Web Search Prior Art Search
1. Introduction
Figure 1: Comparison between Web Search and Prior Art Search
2.Background2.1Structure of Patent
Figure 2 .1: A sample XML file for a patent document from the EPO[1]
• Title• Abstract• Description• Claims• International Patent
Classification Code (IPCR)
• Citations
2.Background2.2 Elasticsearch
• A search engine based on Lucene.
• Open source.
• Neal-time search.
• HTTP web interface and schema-free JSON documents.
• Elasticsearch is developed alongside a data-collection and log-parsing engine called Logstash,
and an analytics and visualisation platform called Kibana.
3. IR Framework3.1 Patent Retrieval Overall Process
Query Patents
Query
Patents in Collection
Indexed Documents
Retrieved Documents
Query (Re)formulation Indexing
Retrieval Model(Elasticsearch)
Feedback
Figure 3.1: Illustration of the process in my patent retrieval system
Patent Preprocess Indexing Patent Preprocess
Index statistic(TF-IDF)
3. IR Framework3.2 Data Collection
• Cross Language Evaluation Forum for Intellectual Property evaluation track (CLEF-IP).
• CLEF-IP 2010 contains 2.6 million patent documents, 2000 topics
68%
24%
8%
Language
EN
DE
FR
Figure 3 .1: Percentage of English, German, and French patents in CLEF-IP 2010 collection
22%
10%
16%
52%
Completeness
Title
Title+Abstract
Title+Claims+[Abstract]
Title+Description+Claims+[Abstract]
Figure 3 .2: Completeness of the presence of English text in the CLEF-IP 2010 patent collection
3. IR Framework3.3 Data Preprocess
.XML .JSON Format UnifySection SelectionLanguage Filter
Index the .JSON file in Elasticsearch
Figure 3.3 Illustration of the process of Date Preprocess
3. IR Framework3.4 Query Reduction
Section selection
Term extraction(TF-IDF)
technical phrase formation
Metadata usage (IPCR)
Section Combination
Figure 3.4 Process of Query Formation
4. Visualization FrameworkQuery and Related Patents Selected from the Results
MongoDB
Django Web Framework• Highlight Common Area
between query and its related patent.
• Common Word Word-Net
Put in
Use
Effects
Figure 4.1:Illustration of the process in a Query and Related Patent Visualization System
5. Conclusion
• Explore the differences of results when we use different query formulation method and find out he optimal one.
• Visualize the retrieval result in a more intuitive way.
Reference:[1]Walid Magdy . (2012). Toward Higher Effectiveness for Recall- Oriented Information Retrieval: A Patent Retrieval Case Study . Retrieved from http://doras.dcu.ie/16814/1/WalidMagdyThesis.pdf
Q & A