European Patent Citations - How to count and how to interpret them? Dietmar Harhoff...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of European Patent Citations - How to count and how to interpret them? Dietmar Harhoff...
European Patent Citations -How to count and how
to interpret them?
Dietmar HarhoffLudwig-Maximilian University Munich (INNO-tec)ZEW and CEPR and EPIP
Karin HoislLudwig-Maximilian University Munich (INNO-tec)and EPIP
Colin WebbOECD
21st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Overview
Motivation Specificities of the EPO Search Process Raw Data How to ... Issues Results Still to Be Tackled Where to Get the Data? And when?
31st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Motivation Patent citations are widely used by economists (and
others) in empirical analysis. The NBER data have had a tremendous impact. They
represent a true pioneering effort with high social benefits. But European patent citations differ considerably from US
citations. Many issues have been left unresolved. Many pitfalls (as we painfully learned) – even suggesting
that the pioneering NBER data have problems which should be corrected in the second round using some of the insights described here.
41st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
The Search Process follows the Guidelines for Examination in the European Patent Office.
Historically, the overall responsibility for search has been with the Directorate General for Searching in The Hague.
Under BEST (Bringing Examination and Search Together), the borderline between search and examination disappears.
Main objective: discovering prior art relevant for determining whether the invention meets the novelty and inventive step requirements.
The search is conducted in the basis of the claims.
Specificities of the EPO Search ProcessInstitutional Framework
51st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
The Search will be terminated if the probability of discovering further relevant documents is
very low compared to the effort needed, or documents are discovered which doubtlessly demonstrate a
violation of patentability requirements.
The Search Report should only include the most important documents, one of several documents of equal importance, the earlier document of two equally important documents.
According to EPO philosophy a good search report contains all relevant information within a minimum number of citations (Michel/Bettels, 2001)
Specificities of the EPO Search ProcessTermination and Reporting
61st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Specificities of the EPO Search ProcessClassification of References
Xparticularly relevant documents when taken alone (implies: the claimed invention cannot be considered novel or cannot be considered to involve an inventive step)
Y particularly relevant if combined with another document of the same category
A documents defining the general state of the art
O documents referring to non-written disclosure
P intermediate documents (documents published between the date of filing and the priority date)
T
documents relating to theory or principle underlying the invention (documents which were published after the filing date and are not in conflict with the application, but were cited for a better understanding of the invention)
E potentially conflicting patent documents, published on or after the filing date of the underlying invention
D document already cited in the application
L document cited for other reasons (e.g., a document which may throw doubt on a priority claim)
71st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Difference in interpretation between examiner and applicant citations
the examiner has to ensure the novelty and inventive step of the invention
the applicant cites work that is related but significantly different from his invention
the EPOLINE/REFI data also include references made by the applicant – all issues raised here apply to them, but we ignore them here unless the searcher/examiner has adopted them (D references)
The date of filing of the EP application is used as a reference date for the search
Referenced documents published between the priority date and the filing date may lead to negative citation lags, but they are not taken to threaten novelty.
Specificities of the EPO Search Process
Consequences for Citation Analysis (1/2)
81st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Referencing no more than what is absolutely necessary and obligation to favor early documents over later ones
other documents important for the economic analysis may not appear in the list of references
Examiners should preferably reference documents in the language of the application.
overestimation of the influence of the applicant’s home countryand possible distortion in citation counts (see below)
The search only identifies documents to which searchers/examiners have access
trivial, but important: documents not accessible in databases will not appear in the “paper trail”
Specificities of the EPO Search Process
Consequences for Citation Analysis (2/2)
91st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Raw Data OECD/EPO data (described in OECD Discussion
Paper Webb/Dernis/Harhoff/Hoisl) EPOLINE references (12/2004) – updated and
checked with REFI (07/2005) EPOLINE data on procedural aspects (search
dates) OPS/ESPACE data on other than WO/EP
documents
101st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 1 – how to deal with timing?
priority date
publication of patent document (grant in the U.S.)
search report published
date of publication of granted patent
publication of supplementary search report
“citing EP/WO patent”
18 mths“cited DE patent”
18 mths
“cited US patent”?
(with PCT equivalent)
111st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 1 – how to deal with timing?
What is our objective? time the “knowledge flow”? (date of the invention – date of the
referenced invention/patent becoming public) compute time between inventions/state of the art used to
characterize inventions? (date of invention = date of priority = date of publication – at least for European patents)
other? “wisest” solution might be to take differences between priority dates
How do we deal with references to different “incarnations” of the same application (about 19.200 cases)?
Example from the databasereferencing patent referenced patent publ. date appl. dateEP0106446A1 DE1947057B 19760318 19690917EP0056080A1 DE1947057A 19700326 19690917
Here: compute lag as difference between (earliest) publication date of referencing search report and (earliest) publication date of referenced document/incarnation
Note: this does not solve the problem with U.S. data (next step: take differences in (earliest) priority dates).
121st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 2 – where to get the data on non-EP/WO documents?
What do we need (ideally)? (earliest) priority date, priority identification application date publication date applicant name (for later: identify “self-referencing”)
3,715,484 unique non-EP/WO entries drawn from OPS server, representing 6,623,877 references
data for 2,223,435 EP/WO entries known from EPO data still missing: 767,122 documents full information for 8,130,010 references for which we can
compute correct citation lags
131st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 2 – where to get the data on non-EP/WO documents?
cutting down data WO publications which do not enter the regional phase at the
EPO are excluded 6,740,846 entries remaining
• international (WO) search reports (n=2,416,318)• EP search reports (n=4,324,528) – some of them supplementary
search reports for WO applications referenced patents
• 1,228,165 EP documents (usually used for analysis)• 559,423 WO documents (sometimes included)• 4,953,258 other documents (lost in most studies)
• US: 2,423,267 – JP: 650,334 – DE: 834,617 – FR: 448,599 – GB: 417,864
• for 4,357,543 we obtained procedural and applicant data from OPS
• meaning: we can compute citation lags (and other information) for 6,198,413 (91.96%) of 6,740,846 references
comparison of citation lagswith and w/o “complete” data
141st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 3 – how to count references and citations?
What do we want to compute? counts of a particular incarnation of a document count of a particular invention (and the associated property
right) being named as relevant prior art Clearly, it is the second objective. HHW’s Rule of Counting Citations
A reference to an X-system document should be taken as a valid citation count of a particular Z-system patent if the X-system document is an equivalent of the Z-system patent.
many issues with equivalents (e.g. multiple equivalents -see paper)
data on equivalents obtained from OPS/ESPACE
comparison of citation count distribution with and without corrections
151st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
total: 1,149,955 NPL references (after taking out references to “Patent Abstracts of Japan” which are implicitly patent literature references)
“highest” EP publication number included: EP1589793 59,433 references not from examiner/searcher (dropped) 239,135 references from WO documents which do not enter
the regional phase at the EPO (dropped) remaining: 851,387 NPL references
• 525,664 references from EP search reports• 383,247 from international (WO) search reports
on average: 31.3% X refs, 17.1% Y refs, 45.6% A refs, 6.0% other refs and 8.8% D refs
How to ... issues 4 – how to deal with the NPL references?
structure of NPL reference types over time
161st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
How to ... issues 5 – how to get the date information?
highest publication number EP1474833, data for 1,452,041 documents (currently about 100,000 documents fewer than we have citation data for)
last 2005 date: June 7, 2005 dates of priority, application, publication, publication of
search reports (international, supplementary, EP), grant etc.
matched with citations data to compute lags (by year)
171st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
0.0
5.1
.15
.2D
en
sity
0 5 10 15 20 25Citation Lag (Years)
Citation Lags With and without corrections
lower quartile 2.28 yrs
median 4.03 yrs
mean 5.03 yrs
upper quartile 6.93 yrs
Source: Harhoff/Hoisl/Webb (2004) – authors’ computations based on EPO/OECD citation database.
181st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Citation LagsWith and w/o use of references to non-EP/no-WO documents
per- lags of EP/WO centile to EP/WO only
1% 0.3 yrs
5% 1.0 yrs
10% 1.5 yrs
25% 2.3 yrs
50% 4.0 yrs
75% 7.0 yrs
90% 10.6 yrs
95% 13.0 yrs
99% 17.5 yrs
max 25.7 yrs
N 1,438,670
lags of EP/WO to non-EP/non-WO
0.7 yrs
1.6 yrs
2.1 yrs
3.8 yrs
8.7 yrs
17.8 yrs
29.7 yrs
41.2 yrs
64.7 yrs
132.0 yrs
4,203,811
all citation lags
0.6 yrs
1.5 yrs
1.9 yrs
3.2 yrs
6.8 yrs
14.3 yrs
25.8 yrs
35.9 yrs
61.8 yrs
132.0 yrs
5,642,481
191st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Citation LagsWith and w/o use of references to non-EP/no-WO documents
0.1
.20
.1.2
0 20 40 60
0
1
Den
sity
citlagGraphs by Citation lags for EP/WO only
201st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Citation Lagsmore details
among the oldest prior art referenced WO1997034383 references US285345A (publication
date 18.9.1883) as a Y reference. WO2001058249 references DE45870C (publication date
12.1.1889) as an A reference. EP1408383A1 references CH1473A (publication date
31.5.1890) as an X reference. reference type and citation lag
X: p25=3.13, p50=6.39, p75=13.48 Y: p25=2.72, p50=5.98, p75=13.60 A: p25=3.51, p50=7.33, p75=15.05
211st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Non-Patent LiteratureClassification of referenced NPL sources over time
NPL Reference Classifications
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
Priority Year
Sh
are
X-type NPL references Y-type NPL references A-type NPL referencs Other NPL references
N= 785,383
221st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Non-Patent LiteratureClassification of referenced NPL sources over time
NPL Reference Classifications - USPTO as ISA
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Priority Year
Sh
are
X-type NPL references Y-type NPL references A-type NPL referencs Other NPL references
N= 105,506
231st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Non-Patent LiteratureClassification of referenced NPL sources over time
N=282,889NPL Reference Classifications - All ISAs but USPTO
0%
20%
40%
60%
80%
100%
1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Priority Year
Sh
are
X-type NPL references Y-type NPL references A-type NPL referencs Other NPL references
241st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Patent Literature Data Classification of Patent Literature References – Raw Data
0%10%20%30%40%50%60%70%80%90%
100%
1
978
1
980
1
982
1
984
1
986
1
988
1
990
1
992
1
994
1
996
1
998
2
000
DE EP FR GB JP OT US WO
34.7%
21.2%
251st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Patent Literature DataClassification of Patent Literature References – (Partial) HHW Rule
0%10%20%30%40%50%60%70%80%90%
100%
1
978
1
980
1
982
1
984
1
986
1
988
1
990
1
992
1
994
1
996
1
998
2
000
DE EP FR GB JP OT US WO
29.0%
38.2%
261st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Patent Literature DataSource of EP Patent References – Overall Comparison
DE EP FR GB JP OT US WO
13.7 19.0 8.2 7.3 8.8 3.1 33.9 6.0
13.2 28.8 8.2 7.3 8.2 3.1 29.9 1.3
Source of Reference
without
with
(partial) correction for equivalents
After (partially) correcting for equivalents, the share of within-EP referencing increases from 19.0% to 28.8% of all patent references
271st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Patent Literature DataAssessing the Quality of Incoming Applications
Pate
nt
Quanti
y a
nd P
ate
nt
Qualit
y in E
uro
pe,
fort
hco
min
g in:
Bri
an K
ahin
and D
om
inque F
ora
y (
eds.
), A
dvanci
ng
Know
ledge a
nd t
he K
now
ledge E
conom
y,
MIT
Pre
ss (
20
06
).
Average Share of X-Type References
0
5
10
15
20
25
30
1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000
Application Year
Sh
are
(%
)
US-Priority JP-Priority DE-Priority
281st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Patent Literature DataComparing citation counts – with and w/o corrections
per- EP to EP onlycentile
1% 0.3 yrs
5% 1.0 yrs
10% 1.5 yrs
25% 2.3 yrs
50% 4.0 yrs
75% 7.0 yrs
90% 10.6 yrs
95% 13.0 yrs
99% 17.5 yrs
max 25.7 yrs
N 1,438,670
EP/WO to EP/WO
0.7 yrs
1.6 yrs
2.1 yrs
3.8 yrs
8.7 yrs
17.8 yrs
29.7 yrs
41.2 yrs
64.7 yrs
132.0 yrs
4,203,811
with partial HHW rule
0.6 yrs
1.5 yrs
1.9 yrs
3.2 yrs
6.8 yrs
14.3 yrs
25.8 yrs
35.9 yrs
61.8 yrs
132.0 yrs
5,642,481
291st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Still to be tackled“self-citations” and “self-references”
misnomer to start with – term applies only for applicant-initiated references
For determining whether the reference points to prior art were generated by the applicant itself (“self-reference”), we have to get the name of the applicant of the referenced document.
Similar for measures of “originality” – we need to get the IPC codes of the referencing and the referenced document.
We have downloaded them – and are in the process of computing the information.
301st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
complete citation data up to 2001 from Colin Webb at OECD - these data do not have the citations lags for non-EP/WO references
extended dataset with citations up to November 2004 from Dietmar Harhoff at INNO-tec
new data with references till mid-2005 soon (to be done by April 15 or never)
to be published on the SING site at CEPR (http://wiki.cepr.org/sing/) – note:
there is a WIKI which allows for comments etc. the INNO-tec site (http://www.inno-tec.de)
Where to get the data? And when?
311st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Interpreting Citation Indicators
Assessing the Quality of Incoming Applications Patent Characteristics by Applicant Type Citations in Value Equations Citations in Opposition Likelihood Equations Citations in Examination Duration Equations
321st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Interpreting Citation Indicators Citation Statistics by Type of Patent Holder
Variable Independent Inventor(N=39,071)
University
(N=5,434)
Corporate
(N=550,144)
references 4.88 3.58 4.15
X-references 0.68 0.73 0.56
self-references 0.04 0.05 0.14
NPL references 0.36 2.37 0.73
citations 1.30 2.73 1.81
X-citations 0.10 0.39 0.17
self-citations 0.03 0.05 0.13
2nd order citations 3.83 11.50 6.85
claims 9.3 10.9 7.3
grant lag 4.2 yrs 5.5 yrs 4.3 yrs
Source: Harhoff/Hoisl/Webb (2005) – authors’ computations based on EPO/OECD citation database.
331st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Sourc
e:
Harh
off
/Hois
l/W
ebb (
20
05
) –
marg
inal eff
ect
s (t
-st
ats
) auth
ors
’ co
mputa
tions
base
d o
n E
PO
/OEC
D c
itati
on
data
base
.
Variable patent value - German PATVAL study
(ordered probit, N=3,350)
references n.s.
X-share n.s.
Y-share n.s.
share self references n.s.
citations 0.0482 (3.74)
X-share n.s.
Y-share n.s.
share self citations n.s.
2nd order citations 0.0044 (2.52)
NPL references n.s.
control variables technical field, year dummies, add. variables
pseudo-R-squared 0.0062
Interpreting Citation IndicatorsCitation Indicators in Value Equations
341st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Variable opposition incidence (binary probit)
N=594,647
revocation incidence (binary probit)
N=27,530
references 0.0004 (3.5) n.s.
X-share 0.0270 (22.9) 0.0411 (3.5)
Y-share 0.0088 (6.8) n.s.
share self references -0.0394 (14.4) n.s.
citations 0.0070 (60.9) -0.0066 (7.3)
X-share 0.0137 (8.7) n.s.
Y-share 0.0125 (6.5) n.s.
share self citations -0.0128 (6.6) -0.0463 (2.4)
2nd order citations 0.0003 (16.0) n.s.
NPL references -0.0009 (3.3) n.s.
control variables technical field, year dummies, add. variables
dto. plus additional variables
pseudo-R-squared 0.0693 0.0269Sourc
e:
Harh
off
/Hois
l/W
ebb (
20
05
) –
marg
inal eff
ect
s (t
-sta
ts)
auth
ors
’ co
mputa
tions
base
d o
n E
PO
/OEC
D c
itati
on d
ata
base
.
Interpreting Citation IndicatorsCitation Indicators in Opposition Equations
351st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Variable POOLED OUTCOME
GRANT GRANT REFUSAL
references DELAY DELAY n.s. DELAY
X-share DELAY DELAY ACCEL. DELAY
GENERALITY DELAY n.s. DELAY n.s.
ORIGINALITY DELAY DELAY DELAY DELAY
citations DELAY ACCEL. DELAY DELAY
NPL references DELAY DELAY DELAY DELAY
control variables claims, designated countries, workload at EPO, prediction error, PCT
Sourc
e:
Harh
off
/Wagner
(20
05
) -
auth
ors
’ co
mputa
tions
base
d
on E
PO
/OEC
D c
itati
on d
ata
base
.
Interpreting Citation IndicatorsCitation Indicators in Process Duration Equations
Results from competing-risk proportional hazard models, N=215,259
361st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
value equation only citation counts matter, composition not relevant 2nd order citations appear to be relevant
opposition incidence equation not number, but composition of references matters considerably number of citations proxies for value X-citations indicate anticipated interaction between opponents 2nd-order citations have some predictive power self-citations and references have a strong negative effect on
opposition incidence – appear to indicate idiosyncratic research paths
Summary of Results (1/2)
371st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
revocation incidence equation weak predictive power (as should be!) residual information still in X-references and 1st-order citations
duration equations heterogeneous effects on different outcomes – pointing to
endogeneous applicant behavior examinination of X-classified references apparently more time-
consuming signal of search report to applicant
Summary of Results (2/2)
381st EPIP Workshop – MilanoEuropean Patent Citations – How to count and how to interpret them?
Results for 2nd-order citations warrant attention (and more experimentation). Currently in development: second-order references.
Results for self-referencing surprisingly strong – should be extended using a complete coding of patent holders (for non-EP references).
Still far cry from a structural model of value, legal robustness and other latent variables – but some progress.
Inclusion of more refined NPL indicators to commence shortly.
Caveats and More Plans