Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in...
-
Upload
randell-hill -
Category
Documents
-
view
213 -
download
0
Transcript of Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in...
Detecting domain dynamics:Association Rule Extraction and
diachronic clustering techniques in support of expertise
Ivana RocheMaha Ghribi
Nathalie VedovottoClaire François
Dominique BesagniPascal Cuxac
Dirk HolsteMarianne HörlesbergerEdgar Schiebel
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 2
Work context
European project DBF:– Development and Verification of a Bibliometric
Model for the identification of Frontier Research
– Part of a Coordination and Support Action (CSA) for the European Research Council (ERC)
– following the requirements of the High Level Expert Group of the ERC • we developed several indicators,• including a Proximity indicator
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 3
ERC framework (1/2)First European funding body to support
investigator-drivenresearch through open and direct competition
Main goals:– Scientific excellence as the only scientific selection criteria– Major grants for the truly best and creative researchers,– To identify and explore new opportunities and directions in all
fields.Scientific domains (panels):
– Physics and Engineering (PE) 10 panels– Life Sciences (LS) 9 panels– Social Sciences and Humanities (SH) 6 panels
Grant Application schemes:– Starting researcher grants (StGs)– Advanced investigator grants (AdGs)
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 4
ERC framework (2/2)
ERC annual budget evolution (2007-2013):
Rate of selected proposals:– StGs (2009) 10% (244 out of 2,503
submitted proposals)– AdGs (2009) 15% (244 out of 1,584
submitted proposals)
0
300
600
900
1200
1500
1800
20072008
20092010
20112012
2013
Mio
. €
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 5
Definition of the Proximity indicatorScope:
– it is employed to infer the « innovative degree » of the proposal through the dynamic change of the scientific landscape corresponding to the proposal’s allocated panel
Data sources:– ERC data:
• Panels description• Projects summary
– Bibliographic databases
Hypothesis:– the closer a proposal is to regions of positive dynamic
change, the more innovative it is
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 6
Description of the proximity indicator
ERCdatabase
Translation of main panels into database queries
Data pre-processing and text mining
Panel description
Bibliographic database
query
DB of bibliographic
references
Construction of two indexed corpora time windows (T1, T2)
Diachronic clustering analysis
T1
T2
Ranking of clusters by novelty degreeT1,T2
Calc
ulati
on o
f PR
OXI
MIT
Yin
dica
tor
ERCdatabase Data
from proposals
Extraction of terminological information
Similarity of proposal with regard to T2clusters
DB of bibliographic
references
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 7
Tools
Assisted indexing– Terminological resources– TreeTagger– FastR
Clustering– Axial K-means (NEURODOC)– Principal Components Analysis
Fuzzy Association Rule Extraction
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 8
Clusters map
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 9
Clusters relationships
Novelty index = Inter-period index & Intra-period index
The lower the Novelty index value, the higher its innovativeness degree
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 10
Calculation of Proximity indicator
ranking of T2 clusters by Novelty index
categorization of clusters
AAA AA A
decreasing innovation
Proposal
Text mining / assisted indexing
decreasing valueof similarity(N clusters)
Keywords
……
Proximity = geometric meansof similarity
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 11
Case study
Starting grant 2009, panel PE07– Systems and communication engineering: electronic,
communication, optical and systems engineering– 29 proposals 4 successful
Database: PASCAL from INIST
First corpus:– Year 2000– 20,568 records 21,781 keywords
Second corpus:– Year 2009– 19,827 records 18,475 keywords
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 12
Clusters rankingHigh Intermediate Low
Angiospermae Optical method Decision support system
Space remote sensing Thin film Optoelectronic device
Statistical simulation Nanoelectronics Imagery
Decision aid Non destructive test Image processing
Radio frequency identification Chemical sensor Computer network
Complementary MOS technology Smart material Closed feedback
Data analysis Microelectromechanical device System identification
Discrete event system Wavelet transformation Photonics
Discrete system Neural network Fiber optic sensors
Process control Particle swarm optimization Wireless network
Ultrasonic transducer User interface Optical fiber network
Control system Optical sensor Integrated optics
Hyperspectral imaging sensor Video signal processing Signal detection
Microelectronic fabrication Piezoelectric sensor Teletraffic
Real time system Constrained optimization Wireless LAN
Radiation detector Actuator Diffraction grating
Robotics
Noise reduction
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 13
Top-ten results
Project proposal ID Innovativeness degree rank Expert panel choice (0/1)
PROP_19 1 0PROP_23 2 0PROP_14 3 0PROP_02 4 0PROP_08 5 1PROP_07 6 0PROP_22 7 0PROP_06 8 0PROP_12 9 0PROP_01 10 1
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 14
Remarks
Proximity is only one of 4 indicators.The process is still being refined:
– categorization of clusters,– number of clusters used to calculate the
indicator.
The limit of the system:– A concept is found only when it is explicitly
stated.– Using a terminological resource means we add
new concept only when it goes mainstream.
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 15
Conclusion
We used Association Rule Extraction and diachronic clustering to detect the evolution of a domain and rate projects accordingly to that dynamics.But, how good is it?We need to:
– do some more tests on other panels,– meet with the panels experts,– improve our assisted indexing,– add some terminological extraction.
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 16
Acknowledgements
Project website: http://www.ait.ac.at/dbf
This work was partially funded by the « Ideas » specific Programme of the EU’s 7th Framework Programme for Research and Technological Development (project reference no. 240765)
Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 17
Thank you