Linked science presentation 25
-
Upload
francesco-osborne -
Category
Science
-
view
321 -
download
3
description
Transcript of Linked science presentation 25
![Page 1: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/1.jpg)
Clustering Citation Distributions for
Semantic Categorization and
Citation PredictionFrancesco Osbornea , Silvio Peronibc, Enrico Mottaa,
a KMi, The Open University, United Kingdomb Department of Computer Science and Engineering, University
of Bologna, Bologna, Italyc Institute of Cognitive Sciences and Technologies, CNR, Rome,
Italy
October 2014
![Page 2: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/2.jpg)
Is it possible to say who will have a
bigger impact?
![Page 3: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/3.jpg)
Can I exploit this information for semantic
expert search?
![Page 4: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/4.jpg)
Clustering
Distribution
Clustering
of Citation
Distribution
Authors’
data
Clusters of authors
with similar citation
patterns
Extraction of Extraction of
semantic
features
RDF
BiDO Ontology
Our approach
![Page 5: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/5.jpg)
Clustering Citation Distributions
We cluster the citation distributions by
exploiting a bottom-up hierarchical clustering
algorithm.
We thus need to define:
• A norm
• A metric to assess the quality of a set of
clusters
![Page 6: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/6.jpg)
Clustering Citation Distributions
A B
C D
![Page 7: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/7.jpg)
Clustering Citation Distributions
A B
C D
1. dis(A, B) = dis(C, D)
![Page 8: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/8.jpg)
Clustering Citation Distributions
A B
C D
1. dis(A, B) = dis(C, D)
2. dis(A, C) > 0 , dis(B, D) > 0
![Page 9: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/9.jpg)
Clustering Citation Distributions
A B
C D
1. dis(A, B) = dis(C, D)
2. dis(A, C) > 0 , dis(B, D) > 0
3. Can be computed incrementally
![Page 10: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/10.jpg)
Clustering Citation Distributions
A simple way to satisfy these three
requirements is to use a normalized Euclidean
distance:
���
�� �����
� ���
� ���� ���
� � �����
� ���
� ��� ���
/2
![Page 11: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/11.jpg)
Clustering Citation Distributions
We want to maximize the homogeneity of the
cluster populations in the following years.
![Page 12: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/12.jpg)
Standard deviation is not the solution…
![Page 13: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/13.jpg)
Standard deviation is not the solution…
![Page 14: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/14.jpg)
Clustering Citation Distributions
We estimate the homogeneity by computing the weighted average of the MAD:
��� � ��������� ������� �
MAD (Median Absolute Deviation ) is a robust measure of statistical dispersion and it is used to compute the variability of an univariate sample of quantitative data.
![Page 15: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/15.jpg)
Clustering Citation Distributions
We then compute the memberships of all authors in our dataset with the centroids of the resulting clusters.
���!��� �"
∑$�%�&' (')*,,�
$�%�&' (')-,,� ���
�/�./0�
Finally we calculate a number of statistics for estimating the evolution of the members of each clusters.
![Page 16: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/16.jpg)
How can we represent this data?
Bibliometric data are subject to the simultaneous application of
different variables. In particular, one should take into account at
least:
• the temporal association of such data to entities;
• the particular agent who provided such data (e.g., Google
Scholar, Scopus, our algorithm);
• the characterisation of such data in at least two different
kinds, i.e., numeric bibliometric data (e.g., the standard
bibliometric measures such as h-index, journal impact factor,
citation count) and categorial bibliometric data (so as to
enable the description of entities, e.g., authors, according to
specific descriptive categories).
![Page 17: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/17.jpg)
BiDO
![Page 18: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/18.jpg)
Extraction of Semantic features
![Page 19: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/19.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ;
![Page 20: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/20.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
![Page 21: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/21.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ;
![Page 22: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/22.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;
![Page 23: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/23.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;
:hasOrderOfMagnitude :[243,729) ;
![Page 24: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/24.jpg)
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;
:hasOrderOfMagnitude :[243,729) ;
:concernsResearchPeriod :5-years-beginning .
![Page 25: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/25.jpg)
:increasing-with-premature-deceleration-and-low-logarithmic-slope-in-
[243,729)-5-years-beginning a :ResearchCareerCategory ;
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;
:hasOrderOfMagnitude :[243,729) ;
:concernsResearchPeriod :5-years-beginning .
![Page 26: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/26.jpg)
:increasing-with-premature-deceleration-and-low-logarithmic-slope-in-[243,729)-
5-years-beginning a :ResearchCareerCategory ;
:hasCurve [ a :Curve ;
:hasTrend :increasing ; :hasAccelerationPoint :premature-deceleration ] ;
:hasSlope [ a :Slope ; :hasStrength :low ; :hasGrowth :logarithmic ] ;
:hasOrderOfMagnitude :[243,729) ;
:concernsResearchPeriod :5-years-beginning .
:john-doe :holdsBibliometricDataInTime [
a :BibliometricDataInTime ;
tvc:atTime [ a time:Interval ; time:hasBeginning :2014-07-11 ] ;
:accordingTo [ a fabio:Algorithm ;
frbr:realization [ a fabio:ComputerProgram ] ] ;
:withBibliometricData
:increasing-with-premature-deceleration-and-low-
logarithmic-slope-in-[243,729)-5-years-beginning .
![Page 27: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/27.jpg)
Evaluation
• We evaluated our method on a dataset of 20000 researchers
working in the field of computer science in the 1990-2010
interval.
• This dataset was derived from the database of Rexplore , a
system to provide support for exploring scholarly data, which
integrates several data sources (Microsoft Academic Search,
DBLP++ and DBpedia).
![Page 28: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/28.jpg)
Evaluation
![Page 29: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/29.jpg)
Evaluation
Y
C18 (1.4%) C22 (2.5%) C25 (2.7%) C28 (2.3%) C29 (8.8%)
range mean range mean range mean range mean range mean
6 420-800 567±98 160-280 209±34 100-180 129±25 60-100 72±14 40-60 39±9
7 440-960 610±120 160-320 225±45 100-200 138±30 60-120 79±18 40-80 45±14
8 440-1020 650±137 160-400 246±58 100-260 158±45 60-160 90±26 40-100 50±18
9 440-1260 699±186 160-440 269±74 100-340 187±68 60-200 104±37 40-120 57±25
10 480-2940 751±411 160-500 292±85 100-400 211±82 60-280 125±57 40-160 68±35
11 480-2480 826±336 180-660 331±112 100-520 241±100 60-540 155±103 40-200 82±47
12 480-3520 914±467 180-860 370±151 100-640 270±126 60-440 166±96 40-260 97±60
![Page 30: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/30.jpg)
Evaluation
![Page 31: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/31.jpg)
Future Works
• Augment the clustering process with a variety of
other features (e.g., research areas, co-authors);
• apply this technique to groups of researchers
rather then single individuals;
• extend BiDO in order to provide a semantically-
aware description of such new features;
• make available a triplestore of bibliometric data
linked to other datasets such as Semantic Web
Dog Food and DBLP.
![Page 32: Linked science presentation 25](https://reader034.fdocuments.in/reader034/viewer/2022042816/55941a0b1a28abf02b8b458e/html5/thumbnails/32.jpg)
Questions?
BiDO Ontology:
http://purl.org/spar/bido