Scientific table type classification in digital library (DocEng 2012)

25
Scientific Table Type Classification in Digital Library Seongchan Kim, Keejun Han, Ying Liu Dept. of Knowledge Service Engineering KAIST, Korea Soon Young Kim Dept. of Overseas Information KISTI, Korea Sept. 6, 2012 (3:35-3:55)

description

Presentation for DocEng 2012 table classification based on IMRAD and fine grained types

Transcript of Scientific table type classification in digital library (DocEng 2012)

Page 1: Scientific table type classification in digital library (DocEng 2012)

Scientific Table TypeClassification in Digital Li-

brarySeongchan Kim, Keejun Han, Ying

Liu Dept. of Knowledge Service En-gineering KAIST, Korea

Soon Young KimDept. of Overseas Informa-

tion KISTI, Korea

Sept. 6, 2012 (3:35-3:55)

Page 2: Scientific table type classification in digital library (DocEng 2012)

Outline

IntroductionTable Type Taxon-omy

IMRAD-

Based Fine-

Grained

Table Type Distribu-tion ClassificationConclusion

Page 3: Scientific table type classification in digital library (DocEng 2012)

Outline

IntroductionTable Type Taxon-omy

IMRAD-

Based Fine-

Grained

Classifica-tion Con-clusion

Page 4: Scientific table type classification in digital library (DocEng 2012)

4 / 20

Introduction

• Are there any special types of tables in papers in scientific papers?If yes? What are they?

Page 5: Scientific table type classification in digital library (DocEng 2012)

5 / 20

Introduction

• Are there any special types of tables in papers in scientific papers?If yes? What are they?

Page 6: Scientific table type classification in digital library (DocEng 2012)

Outline

IntroductionTable Type Taxon-omy

IMRAD-

Based Fine-

Grained

Classifica-tion Con-clusion

Page 7: Scientific table type classification in digital library (DocEng 2012)

7 / 20

Table Type Taxonomy

▶ 2,500 tables randomly from 25 randomly se-lected scientific journals published by Springer from 2006 to 2010

▶ Biomedical and Life Science, Chemistry and Ma-terials Science, Computer Science, Electrical En-gineering, and Medicine

▶ TableSeer

▶ We found IMRAD-based table taxonomy Fine-grained table taxonomy

Page 8: Scientific table type classification in digital library (DocEng 2012)

IMRAD-Based Table Taxonomy

▶ Consideringthe structural position of tables within a document

▶ Table type is simply decided by the location ofthetable

E.g) if a table is in the introduction part of the paper Introduction table Other functions

Scien-tific Table

Introduc-tion Table

Method Ta-ble

Re-sult Ta-ble

Discus-sion Table

8 / 20

Page 9: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Table type is decided by table contents and purposes

Scien-tific Table

Defini-tion

Table

Statis-tic

Table

Sur-vey

Table

Exam-ple

Table

Proce-dureTable

Experi-ment

Setting Table

Experi-ment Result Table

9 / 20

Page 10: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Definition Tables Consist of defining

termsand their explanations

Usually appears be-fore the experiment

10 / 20

Page 11: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Statistics/DistributionTables Common statistical

or distribution data Not related with

the current experi-ment being carried out in the paper

11 / 20

Page 12: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Survey Question/Result Table

Contain question-naires ofthethose questionnaires

sur-vey

and the resultsof

12 / 20

Page 13: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Example Ta-ble Sho

winstancesthat introduceand emphasizesomethingthat needs to be explained

clearly

13 / 20

Page 14: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Procedure Ta-bles Describe the se-

quence,methods

step,

flow,

or sched-ule

ofthe

14 / 20

Page 15: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Experiment Setting Tables Describe items required for the experiment configurations, parameters, data, appara-

tus, etc.

15 / 20

Page 16: Scientific table type classification in digital library (DocEng 2012)

Fine-Grained Table Taxonomy

▶ Experiment Setting Tables accompanied with a summary describing the output

of theexperiment

Some are shown comparing the other results

16 / 20

Page 17: Scientific table type classification in digital library (DocEng 2012)

Table Type Distribution

By IMRAD Taxon-omy

3.7%

20.2%

74.6%

1.5%

Introduc-tionMeth-odsRe-sultsDiscus-sion

2.9%

5.0% 0.3

%3.3%

1.6%

14.9%

72.0%

Definition

Statistics

Survey

Example

Procedure

Exp. Set-

ting Exp.

Result

17 / 20

By Fine-Grained Taxon-omy

▶ Annotation 2,380and 2,324 tables that had agreedon label-

ing from more that two annotators out of 2,500 ta-bles

Inter-Annotator Agreement• 𝑘 = 0.64 for IMRAD annotation• 𝑘 = 0.53 for fine-grained annotation

Page 18: Scientific table type classification in digital library (DocEng 2012)

Outline

IntroductionTable Type Taxon-omy

IMRAD-

Based Fine-

Grained

Classifica-tion Con-clusion

Page 19: Scientific table type classification in digital library (DocEng 2012)

19 / 20

Experiment

▶ A preliminary classification Only textual feature from Table

• Table Caption• Table Reference Text

Textual Information obtained from metadata of Table-Seer [ ]

▶ Settings DataSet

• 2,380 tables for IMRAD classification• 2,324 tables for fine-grained classification

10-fold Cross validation SVM and Decision Tree in Weka toolkit (default

setting)

Page 20: Scientific table type classification in digital library (DocEng 2012)

Experiment

▶ Textual Features Given a table caption and reference text T Feature Selection: top 300 terms by chi-square

▶ Feature term weighting The meaning of numerical feature: binary, TF, TF-IDF TTF-ICF (Table Term Frequency-Inverse Category Fre-

quency)

Combined version of TTF-ITTF and TF-ICF• TTF-ITTF: table search (Liu)• TF-ICF: text categorization (Cho and Kim)

20 / 20

C1: T1,

T2

C2: T3,

T4

W1 : appears T1 and T2

W2 : appears T1 and T3

Page 21: Scientific table type classification in digital library (DocEng 2012)

21 / 20

Experiment

▶ Re-sult

Performance of IMRAD Classification by Fea-tures

Performance of Fine-grained Classification by Features

Features SVM Decision Tree

P R F P R F

Cap.(Baseline)

0.836 0.506 0.543 0.947 0.550 0.792

Ref. 0.875 0.705 0.761 0.930 0.639 0.73

Cap.+Ref. 0.967 0.784 0.866 0.938 0.746 0.831

Features SVM Decision TreeP R F P R F

Cap.(Baseline)

0.627 0.333 0.397 0.522 0.271 0.302

Ref. 0.707 0.615 0.649 0.790 0.673 0.716

Cap.+Ref. 0.701 0.657 0.668 0.764 0.62 0.671

Page 22: Scientific table type classification in digital library (DocEng 2012)

22 / 20

Experiment

▶ Re-sult

Performance of IMRAD Classification by Types

Type SVM Decision Tree

P R F P R F

Introduction 0.968 0.6 0.741 0.907 0.68 0.777

Methods 0.901 0.996 0.943 0.912 0.992 0.95

Results 1 1 1 1 1 1

Discussion 1 0.543 0.704 0.933 0.314 0.44

Macro Avg. 0.967 0.784 0.866 0.938 0.746 0.831

Micro Avg. 0.977 0.976 0.973 0.973 0.975 0.972

Page 23: Scientific table type classification in digital library (DocEng 2012)

23 / 20

Experiment

▶ Re-sult

Performance of Fine-grained Classification by Types

Type SVM Decision Tree

P R F P R F

Definition 0.689 0.609 0.646 0.837 0.522 0.643

Statistics 0.699 0.879 0.779 0.898 0.681 0.775Survey 0 0 0 0 0 0

Example 0.716 0.725 0.72 0.875 0.525 0.656Procedure 0.9 0.486 0.632 1 0.649 0.787

Exp. Setting 0.905 0.9 0.902 0.74 0.963 0.837Exp. Result 1 1 1 1 1 1Macro Avg. 0.701 0.657 0.668 0.764 0.62 0.671Micro Avg. 0.947 0.947 0.946 0.94 0.936 0.987

Page 24: Scientific table type classification in digital library (DocEng 2012)

Outline

IntroductionTable Type Taxon-omy

IMRAD-

Based Fine-

Grained

Classifica-tion Con-clusion

Page 25: Scientific table type classification in digital library (DocEng 2012)

25 / 20

Conclusion

▶ Introduced our study of table types and classifi-cations

in scientific papers IMRAD-based Taxonomy Fine-Grained Taxonomy

▶ Future Work Developing various features from table layout and con-

tents