Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in...

17
Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto Claire François Dominique Besagni Pascal Cuxac Dirk Holste Marianne Hörlesberger Edgar Schiebel

Transcript of Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in...

Page 1: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domain dynamics:Association Rule Extraction and

diachronic clustering techniques in support of expertise

Ivana RocheMaha Ghribi

Nathalie VedovottoClaire François

Dominique BesagniPascal Cuxac

Dirk HolsteMarianne HörlesbergerEdgar Schiebel

Page 2: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 2

Work context

European project DBF:– Development and Verification of a Bibliometric

Model for the identification of Frontier Research

– Part of a Coordination and Support Action (CSA) for the European Research Council (ERC)

– following the requirements of the High Level Expert Group of the ERC • we developed several indicators,• including a Proximity indicator

Page 3: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 3

ERC framework (1/2)First European funding body to support

investigator-drivenresearch through open and direct competition

Main goals:– Scientific excellence as the only scientific selection criteria– Major grants for the truly best and creative researchers,– To identify and explore new opportunities and directions in all

fields.Scientific domains (panels):

– Physics and Engineering (PE) 10 panels– Life Sciences (LS) 9 panels– Social Sciences and Humanities (SH) 6 panels

Grant Application schemes:– Starting researcher grants (StGs)– Advanced investigator grants (AdGs)

Page 4: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 4

ERC framework (2/2)

ERC annual budget evolution (2007-2013):

Rate of selected proposals:– StGs (2009) 10% (244 out of 2,503

submitted proposals)– AdGs (2009) 15% (244 out of 1,584

submitted proposals)

0

300

600

900

1200

1500

1800

20072008

20092010

20112012

2013

Mio

. €

Page 5: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 5

Definition of the Proximity indicatorScope:

– it is employed to infer the « innovative degree » of the proposal through the dynamic change of the scientific landscape corresponding to the proposal’s allocated panel

Data sources:– ERC data:

• Panels description• Projects summary

– Bibliographic databases

Hypothesis:– the closer a proposal is to regions of positive dynamic

change, the more innovative it is

Page 6: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 6

Description of the proximity indicator

ERCdatabase

Translation of main panels into database queries

Data pre-processing and text mining

Panel description

Bibliographic database

query

DB of bibliographic

references

Construction of two indexed corpora time windows (T1, T2)

Diachronic clustering analysis

T1

T2

Ranking of clusters by novelty degreeT1,T2

Calc

ulati

on o

f PR

OXI

MIT

Yin

dica

tor

ERCdatabase Data

from proposals

Extraction of terminological information

Similarity of proposal with regard to T2clusters

DB of bibliographic

references

Page 7: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 7

Tools

Assisted indexing– Terminological resources– TreeTagger– FastR

Clustering– Axial K-means (NEURODOC)– Principal Components Analysis

Fuzzy Association Rule Extraction

Page 8: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 8

Clusters map

Page 9: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 9

Clusters relationships

Novelty index = Inter-period index & Intra-period index

The lower the Novelty index value, the higher its innovativeness degree

Page 10: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 10

Calculation of Proximity indicator

ranking of T2 clusters by Novelty index

categorization of clusters

AAA AA A

decreasing innovation

Proposal

Text mining / assisted indexing

decreasing valueof similarity(N clusters)

Keywords

……

Proximity = geometric meansof similarity

Page 11: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 11

Case study

Starting grant 2009, panel PE07– Systems and communication engineering: electronic,

communication, optical and systems engineering– 29 proposals 4 successful

Database: PASCAL from INIST

First corpus:– Year 2000– 20,568 records 21,781 keywords

Second corpus:– Year 2009– 19,827 records 18,475 keywords

Page 12: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 12

Clusters rankingHigh Intermediate Low

Angiospermae Optical method Decision support system

Space remote sensing Thin film Optoelectronic device

Statistical simulation Nanoelectronics Imagery

Decision aid Non destructive test Image processing

Radio frequency identification Chemical sensor Computer network

Complementary MOS technology Smart material Closed feedback

Data analysis Microelectromechanical device System identification

Discrete event system Wavelet transformation Photonics

Discrete system Neural network Fiber optic sensors

Process control Particle swarm optimization Wireless network

Ultrasonic transducer User interface Optical fiber network

Control system Optical sensor Integrated optics

Hyperspectral imaging sensor Video signal processing Signal detection

Microelectronic fabrication Piezoelectric sensor Teletraffic

Real time system Constrained optimization Wireless LAN

Radiation detector Actuator Diffraction grating

Robotics

Noise reduction

Page 13: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 13

Top-ten results

Project proposal ID Innovativeness degree rank Expert panel choice (0/1)

PROP_19 1 0PROP_23 2 0PROP_14 3 0PROP_02 4 0PROP_08 5 1PROP_07 6 0PROP_22 7 0PROP_06 8 0PROP_12 9 0PROP_01 10 1

Page 14: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 14

Remarks

Proximity is only one of 4 indicators.The process is still being refined:

– categorization of clusters,– number of clusters used to calculate the

indicator.

The limit of the system:– A concept is found only when it is explicitly

stated.– Using a terminological resource means we add

new concept only when it goes mainstream.

Page 15: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 15

Conclusion

We used Association Rule Extraction and diachronic clustering to detect the evolution of a domain and rate projects accordingly to that dynamics.But, how good is it?We need to:

– do some more tests on other panels,– meet with the panels experts,– improve our assisted indexing,– add some terminological extraction.

Page 16: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 16

Acknowledgements

Project website: http://www.ait.ac.at/dbf

This work was partially funded by the « Ideas » specific Programme of the EU’s 7th Framework Programme for Research and Technological Development (project reference no. 240765)

Page 17: Detecting domain dynamics: Association Rule Extraction and diachronic clustering techniques in support of expertise Ivana Roche Maha Ghribi Nathalie Vedovotto.

Detecting domains dynamics GTM 2011 – Atlanta, GA - September 14th, 2011 17

Thank you