Scientific table type classification in digital library (DocEng 2012)
-
Upload
seongchan-kim -
Category
Technology
-
view
79 -
download
3
description
Transcript of Scientific table type classification in digital library (DocEng 2012)
Scientific Table TypeClassification in Digital Li-
brarySeongchan Kim, Keejun Han, Ying
Liu Dept. of Knowledge Service En-gineering KAIST, Korea
Soon Young KimDept. of Overseas Informa-
tion KISTI, Korea
Sept. 6, 2012 (3:35-3:55)
Outline
IntroductionTable Type Taxon-omy
IMRAD-
Based Fine-
Grained
Table Type Distribu-tion ClassificationConclusion
Outline
IntroductionTable Type Taxon-omy
IMRAD-
Based Fine-
Grained
Classifica-tion Con-clusion
4 / 20
Introduction
• Are there any special types of tables in papers in scientific papers?If yes? What are they?
5 / 20
Introduction
• Are there any special types of tables in papers in scientific papers?If yes? What are they?
Outline
IntroductionTable Type Taxon-omy
IMRAD-
Based Fine-
Grained
Classifica-tion Con-clusion
7 / 20
Table Type Taxonomy
▶ 2,500 tables randomly from 25 randomly se-lected scientific journals published by Springer from 2006 to 2010
▶ Biomedical and Life Science, Chemistry and Ma-terials Science, Computer Science, Electrical En-gineering, and Medicine
▶ TableSeer
▶ We found IMRAD-based table taxonomy Fine-grained table taxonomy
IMRAD-Based Table Taxonomy
▶ Consideringthe structural position of tables within a document
▶ Table type is simply decided by the location ofthetable
E.g) if a table is in the introduction part of the paper Introduction table Other functions
Scien-tific Table
Introduc-tion Table
Method Ta-ble
Re-sult Ta-ble
Discus-sion Table
8 / 20
Fine-Grained Table Taxonomy
▶ Table type is decided by table contents and purposes
Scien-tific Table
Defini-tion
Table
Statis-tic
Table
Sur-vey
Table
Exam-ple
Table
Proce-dureTable
Experi-ment
Setting Table
Experi-ment Result Table
9 / 20
Fine-Grained Table Taxonomy
▶ Definition Tables Consist of defining
termsand their explanations
Usually appears be-fore the experiment
10 / 20
Fine-Grained Table Taxonomy
▶ Statistics/DistributionTables Common statistical
or distribution data Not related with
the current experi-ment being carried out in the paper
11 / 20
Fine-Grained Table Taxonomy
▶ Survey Question/Result Table
Contain question-naires ofthethose questionnaires
sur-vey
and the resultsof
12 / 20
Fine-Grained Table Taxonomy
▶ Example Ta-ble Sho
winstancesthat introduceand emphasizesomethingthat needs to be explained
clearly
13 / 20
Fine-Grained Table Taxonomy
▶ Procedure Ta-bles Describe the se-
quence,methods
step,
flow,
or sched-ule
ofthe
14 / 20
Fine-Grained Table Taxonomy
▶ Experiment Setting Tables Describe items required for the experiment configurations, parameters, data, appara-
tus, etc.
15 / 20
Fine-Grained Table Taxonomy
▶ Experiment Setting Tables accompanied with a summary describing the output
of theexperiment
Some are shown comparing the other results
16 / 20
Table Type Distribution
By IMRAD Taxon-omy
3.7%
20.2%
74.6%
1.5%
Introduc-tionMeth-odsRe-sultsDiscus-sion
2.9%
5.0% 0.3
%3.3%
1.6%
14.9%
72.0%
Definition
Statistics
Survey
Example
Procedure
Exp. Set-
ting Exp.
Result
17 / 20
By Fine-Grained Taxon-omy
▶ Annotation 2,380and 2,324 tables that had agreedon label-
ing from more that two annotators out of 2,500 ta-bles
Inter-Annotator Agreement• 𝑘 = 0.64 for IMRAD annotation• 𝑘 = 0.53 for fine-grained annotation
Outline
IntroductionTable Type Taxon-omy
IMRAD-
Based Fine-
Grained
Classifica-tion Con-clusion
19 / 20
Experiment
▶ A preliminary classification Only textual feature from Table
• Table Caption• Table Reference Text
Textual Information obtained from metadata of Table-Seer [ ]
▶ Settings DataSet
• 2,380 tables for IMRAD classification• 2,324 tables for fine-grained classification
10-fold Cross validation SVM and Decision Tree in Weka toolkit (default
setting)
Experiment
▶ Textual Features Given a table caption and reference text T Feature Selection: top 300 terms by chi-square
▶ Feature term weighting The meaning of numerical feature: binary, TF, TF-IDF TTF-ICF (Table Term Frequency-Inverse Category Fre-
quency)
Combined version of TTF-ITTF and TF-ICF• TTF-ITTF: table search (Liu)• TF-ICF: text categorization (Cho and Kim)
20 / 20
C1: T1,
T2
C2: T3,
T4
W1 : appears T1 and T2
W2 : appears T1 and T3
21 / 20
Experiment
▶ Re-sult
Performance of IMRAD Classification by Fea-tures
Performance of Fine-grained Classification by Features
Features SVM Decision Tree
P R F P R F
Cap.(Baseline)
0.836 0.506 0.543 0.947 0.550 0.792
Ref. 0.875 0.705 0.761 0.930 0.639 0.73
Cap.+Ref. 0.967 0.784 0.866 0.938 0.746 0.831
Features SVM Decision TreeP R F P R F
Cap.(Baseline)
0.627 0.333 0.397 0.522 0.271 0.302
Ref. 0.707 0.615 0.649 0.790 0.673 0.716
Cap.+Ref. 0.701 0.657 0.668 0.764 0.62 0.671
22 / 20
Experiment
▶ Re-sult
Performance of IMRAD Classification by Types
Type SVM Decision Tree
P R F P R F
Introduction 0.968 0.6 0.741 0.907 0.68 0.777
Methods 0.901 0.996 0.943 0.912 0.992 0.95
Results 1 1 1 1 1 1
Discussion 1 0.543 0.704 0.933 0.314 0.44
Macro Avg. 0.967 0.784 0.866 0.938 0.746 0.831
Micro Avg. 0.977 0.976 0.973 0.973 0.975 0.972
23 / 20
Experiment
▶ Re-sult
Performance of Fine-grained Classification by Types
Type SVM Decision Tree
P R F P R F
Definition 0.689 0.609 0.646 0.837 0.522 0.643
Statistics 0.699 0.879 0.779 0.898 0.681 0.775Survey 0 0 0 0 0 0
Example 0.716 0.725 0.72 0.875 0.525 0.656Procedure 0.9 0.486 0.632 1 0.649 0.787
Exp. Setting 0.905 0.9 0.902 0.74 0.963 0.837Exp. Result 1 1 1 1 1 1Macro Avg. 0.701 0.657 0.668 0.764 0.62 0.671Micro Avg. 0.947 0.947 0.946 0.94 0.936 0.987
Outline
IntroductionTable Type Taxon-omy
IMRAD-
Based Fine-
Grained
Classifica-tion Con-clusion
25 / 20
Conclusion
▶ Introduced our study of table types and classifi-cations
in scientific papers IMRAD-based Taxonomy Fine-Grained Taxonomy
▶ Future Work Developing various features from table layout and con-
tents