Modeling Topic-level Academic Influence in Scientific...

62
Modeling Topic-level Academic Influence in Scientific Literatures Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang Shanghai Jiao Tong University Feb 13, 2016 Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University) Modeling Topic-level Academic Influence in Scientific Literatures Feb 13, 2016 1 / 37

Transcript of Modeling Topic-level Academic Influence in Scientific...

Page 1: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Modeling Topic-level Academic Influence in ScientificLiteratures

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao,Luoyi Fu, Li Song, Xinbing Wang

Shanghai Jiao Tong University

Feb 13, 2016

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 1 / 37

Page 2: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 2 / 37

Page 3: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 3 / 37

Page 4: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

When a beginner starts to explore a new field ...

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 4 / 37

Page 5: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

Figure 1 : Result of Google Scholar

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 5 / 37

Page 6: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

Figure 2 : Defects of Google Scholar

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 6 / 37

Page 7: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

Figure 3 : Defects of Google Scholar

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 7 / 37

Page 8: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

Stand on the shoulders of giants

– Isaac Newton

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 8 / 37

Page 9: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Motivation

Motivation

Find those giants !!!

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 9 / 37

Page 10: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 10 / 37

Page 11: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

Figure 4 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 11 / 37

Page 12: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

1. Pure Citation Number

Figure 5 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 12 / 37

Page 13: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

1. Pure Citation Number

2. Link-based Algorithm

Figure 6 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 13 / 37

Page 14: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

1. Pure Citation Number

2. Link-based Algorithm

3. Models combining multiple factors

Figure 7 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 14 / 37

Page 15: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

4. Models analyzing text and citation jointly

Figure 8 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 15 / 37

Page 16: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

Related works

Paper

Content

Reference/Citation

VenueTemporal

Author

4. Models analyzing text and citation jointly

J-Index frameworkbelongs here

Figure 9 : Factors of one paper

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 16 / 37

Page 17: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

J-Index Framework

• Three assumptions of J-Index:

1 A paper’s academic influence increases as it gains more citations.2 A paper with stronger citations intends to be more influential.3 A paper cited by more innovative papers is more influential.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 17 / 37

Page 18: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

J-Index Framework

• Three assumptions of J-Index:1 A paper’s academic influence increases as it gains more citations.

2 A paper with stronger citations intends to be more influential.3 A paper cited by more innovative papers is more influential.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 17 / 37

Page 19: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

J-Index Framework

• Three assumptions of J-Index:1 A paper’s academic influence increases as it gains more citations.2 A paper with stronger citations intends to be more influential.

3 A paper cited by more innovative papers is more influential.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 17 / 37

Page 20: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

J-Index Framework

• Three assumptions of J-Index:1 A paper’s academic influence increases as it gains more citations.2 A paper with stronger citations intends to be more influential.3 A paper cited by more innovative papers is more influential.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 17 / 37

Page 21: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

J-Index Framework

J-Index Framework

• Three assumptions of J-Index:1 A paper’s academic influence increases as it gains more citations.2 A paper with stronger citations intends to be more influential.3 A paper cited by more innovative papers is more influential.

• We define the J-Index as follows:

J-Index-Score(u) =X

c2C(u)

�(c) ⇥ �(c, u)

• C(u): the set of paper u’s citations, obtained from input dataset.

• �(c): the innovativeness of paper c.

• �(c, u): the citation strength between paper c and paper u.

• Both �(c) and �(c, u) are obtained from subsequent model.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 18 / 37

Page 22: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM)

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 19 / 37

Page 23: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

• Reference Topic Model is one way to obtain �(c) and �(c, u).

• The intuition: a researcher may write a word based on his/her ownidea or “inherits” some thoughts from one of its references.

• Topic Innovation: come from one’s own idea.

• Topic Inheritance: come from one of cited papers.

• Citation Strength: determine which reference is selected

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 20 / 37

Page 24: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

• Reference Topic Model is one way to obtain �(c) and �(c, u).

• The intuition: a researcher may write a word based on his/her ownidea or “inherits” some thoughts from one of its references.

• Topic Innovation: come from one’s own idea.

• Topic Inheritance: come from one of cited papers.

• Citation Strength: determine which reference is selected

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 20 / 37

Page 25: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

• Reference Topic Model is one way to obtain �(c) and �(c, u).

• The intuition: a researcher may write a word based on his/her ownidea or “inherits” some thoughts from one of its references.

• Topic Innovation: come from one’s own idea.

• Topic Inheritance: come from one of cited papers.

• Citation Strength: determine which reference is selected

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 20 / 37

Page 26: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

• Reference Topic Model is one way to obtain �(c) and �(c, u).

• The intuition: a researcher may write a word based on his/her ownidea or “inherits” some thoughts from one of its references.

• Topic Innovation: come from one’s own idea.

• Topic Inheritance: come from one of cited papers.

• Citation Strength: determine which reference is selected

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 20 / 37

Page 27: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

• Reference Topic Model is one way to obtain �(c) and �(c, u).

• The intuition: a researcher may write a word based on his/her ownidea or “inherits” some thoughts from one of its references.

• Topic Innovation: come from one’s own idea.

• Topic Inheritance: come from one of cited papers.

• Citation Strength: determine which reference is selected

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 20 / 37

Page 28: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

Figure 10 : Generative Model of RefTM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 21 / 37

Page 29: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Generative Model

Figure 11 : Generative Model of RefTM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 22 / 37

Page 30: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Generative Model

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 23 / 37

Page 31: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 32: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 33: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 34: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 35: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 36: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 37: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Digression: Intuition of Inference Process

• Roll a dice N times, side i shows ni times.

• What’s the best estimation of each side’s probability.

• For each side i, probability Pi = niN = niP6

j=1 nj

• Smooth E↵ect: Suppose we have already view each side i �i times.

• Update: For each side i, probability Pi = ni+�iP6j=1(nj+�j)

• Inference: Observation ⇢ Parameters

• In RefTM, observations: words & citations; parameters (we mainlyconcerned): � and �

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 24 / 37

Page 38: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

RefTM Inference: Gibbs Sampling

Figure 12 : Gibbs sampling equations & Algorithm for RefTM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 25 / 37

Page 39: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Reference Topic Model (RefTM) Parameter Estimation

Visualization of RefTM’s output

164

655

566

163

193

245

682

683

277

274

150

684

223

427

685

632 686

229

581

533

687

688

689

690

643381

384

386

691

436

452

454

453

576

586

744763

765

863

164

163

229

381

384

386

452

454576

586

744763

765

Index: 164 J-Index��0.343Latent Dirichlet

Allocation

Index: 454 J-Index��0.043The Author-Topic

model for authors and documents

Index: 576 J-Index��0.013�Gap: A fator modelfor discrete data

Index: 452 J-Index��0.011Dynamic Topic Models

Index: 381 J-Index��0.032Correlated Topic

Models

Index: 763 J-Index��0.009Multiscale topic

tomography

Index: 765 J-Index��0.012Topics over time: a non-markov continuous-time model of topical trends

Index: 744 J-Index��0.011Modeling and predicting

personal information dissemination behavior

Index: 586 J-Index��0.011Group and topic discovery from

relations and text

0.64 0.69

0.62

0.22

0.66

0.37

0.19

0.45

0.15

0.69

0.12

0.08

0.99

0.77

Figure 1: Right hand side is an illustrative citation graph in which the thickness of edge represents the citation strength and thevertex size indicates one paper’s academic influence. Left hand side presents each paper’s J-Index and quantitative measurementof the citation strength.

tiveness of article as well as the citation strength by jointlyutilizing the textual content and citation network in scientificliteratures.

We conduct extensive experiments on a collection of morethan 420,000 research papers with over two million cita-tions. Our results show that RefTM can effectively discovertopics of high quality, model paper novelty and predict ci-tation strength. We also calculate the J-Index of all researchpapers and the results validate its effectiveness of capturingtopic-level influence in scientific literatures.

2 Academic Influence MetricWe model a collection of scientific literatures as a directedgraph G = (N, E) in which each node e � N represents anarticles and each edge (u, v) � E indicates a citation frompaper u to paper v. Our goal is to find a metric F (·) suchthat F (e) represents the academic influence of paper e. Anillustrative example is shown in Figure 1.

Naturally, we want the metric value correlated with theground truth. However, this ground truth is unobservable,making it not obvious how one may quantify such a notionof “influence”. Consequently, we need some commonsenseknowledge when designing this influence metric. Here wehave three general assumptions.Assumption 1. A paper’s academic influence increases asit gains more citations.

Citation is the most direct indicator of scientific merit, re-flecting the academic influence of a paper. This assumptionresonates with the intuition that a paper will increase its in-fluence when there are more papers citing it. Put this math-ematically, suppose we denote the set of paper m’s citationsas C(m), then F (·) should be a monotonically increasingfunction in terms of |C(m)|, the citation number of paperm. Notice that F (·) is generally not a monotonic functionover the whole corpus. A paper with 800 citations may be

less influential than another paper with 650 citations due tomany other factors like the function of each citation, whichleads to our second assumption.Assumption 2. A paper with stronger citations intends tobe more influential.

Many citations are referenced out of “politeness, policyor piety” and have little impact on another work. We needto consider the strength of each citation when measuring anarticle’s academic influence. Therefore, F (·) should includea component function �(·), defined on edge set E, to assessthe citation strength. Moveover, F (u) should increase moreif one citation (u, v) has a larger value of �(u, v). The con-ception of citation strength enables us to filter those citationsmade in passing by adding a relatively small influence score.Assumption 3. A paper cited by more innovative papers ismore influential.

In many cases, simply relying on the citation strength fallsshort of considering the difficulty of obtaining that citation.An innovative paper intends to generate most words from itsown ideas, leading to small strengths of all citations asso-ciated with it. For this reason, F (·) should contain anothernode-weight function �(·) to take into account the innova-tiveness of each paper.

J-IndexBased on three above-mentioned assumptions, we introduceJ-Index, a quantitative metric modeling topic-level academicinfluence. J-Index is actually a metric framework, includ-ing two key components �(·) and �(·), obtained from subse-quent model. We define the J-Index of paper u as follows:

J-Index-Score(u) =X

c�C(u)

�(c) ⇥ �(c, u) (1)

J-Index is calculated as a sum of all positive numbers, andthus the J-Index score of one paper will never decrease as

Figure 13 : Right hand side is an illustrative citation graph in which the thickness of edge

represents the citation strength and the vertex size indicates one papers academic influence.

Left hand side presents each paper’s J-Index.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 26 / 37

Page 40: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 27 / 37

Page 41: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Datasets

Datasets

• Dataset 1: a large unsupervised collection of 426728 articles withover 209 million citations.

• Dataset 2: a small supervised collection of 799 papers obtained from(Liu et al. 2010).

• The average paper length of two corpora are 83 and 98 words.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 28 / 37

Page 42: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence

1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score(Mimno et al. 2011).

2 Using dataset 1 and an external dataset of 3.34 million papers whencalculating PMI-Score.

• Citation Strength Prediction

1 Using dataset 2, in which the strength of each citation is classifiedinto three levels – “1, 2, 3”.

2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 43: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).

2 Using dataset 1 and an external dataset of 3.34 million papers whencalculating PMI-Score.

• Citation Strength Prediction

1 Using dataset 2, in which the strength of each citation is classifiedinto three levels – “1, 2, 3”.

2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 44: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).2 Using dataset 1 and an external dataset of 3.34 million papers when

calculating PMI-Score.

• Citation Strength Prediction

1 Using dataset 2, in which the strength of each citation is classifiedinto three levels – “1, 2, 3”.

2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 45: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).2 Using dataset 1 and an external dataset of 3.34 million papers when

calculating PMI-Score.

• Citation Strength Prediction

1 Using dataset 2, in which the strength of each citation is classifiedinto three levels – “1, 2, 3”.

2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 46: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).2 Using dataset 1 and an external dataset of 3.34 million papers when

calculating PMI-Score.

• Citation Strength Prediction1 Using dataset 2, in which the strength of each citation is classified

into three levels – “1, 2, 3”.

2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 47: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).2 Using dataset 1 and an external dataset of 3.34 million papers when

calculating PMI-Score.

• Citation Strength Prediction1 Using dataset 2, in which the strength of each citation is classified

into three levels – “1, 2, 3”.2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and

“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 48: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Aspects

Evaluation Aspects

• Topic Coherence1 Metrics: PMI-Score (Newman et al. 2010) and topic coherence-Score

(Mimno et al. 2011).2 Using dataset 1 and an external dataset of 3.34 million papers when

calculating PMI-Score.

• Citation Strength Prediction1 Using dataset 2, in which the strength of each citation is classified

into three levels – “1, 2, 3”.2 Metrics: averaged AUC value for decision boundaries “1 vs. 2, 3” and

“1, 2 vs. 3”.

• Case Study: Rank INFOCOM

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 29 / 37

Page 49: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Results

Topic Coherence

citation number0 5 10 15 20 25 30

Num

ber o

f Pap

ers

×104

0

2

4

6

8

10

12

14

16

18

Figure 3: Histogram of Citation Number

Topic Number K10 20 30 40 50

PMI-S

core

0

0.2

0.4

0.6

0.8

1

1.2LDARefTM

LDA RefTM

Topi

c C

oher

ence

Sco

re

-240

-220

-200

-180

-160

-140

-120

-100

Figure 4: Topic Coherence Evaluation

Table 2: Top 5 Articles in INFOCOM 2003 randked by J-Index & citaionsTitle J-Index citation countsTop 5 Articles in INFOCOM 2003 ranked by J-IndexAd hoc positioning system (APS) using AOA 6.75 115Performance anomaly of 802.11b 5.17 127Packet leashes: a defense against wormhole attacks in wireless networks 4.13 74Unreliable sensor grids: coverage, connectivity and diameter 4.00 82Sensor deployment and target localization based on virtual forces 3.61 60Top 5 Articles in INFOCOM 2003 ranked by citation numberPerformance anomaly of 802.11b 5.17 127Ad hoc positioning system (APS) using AOA 6.75 115Optimal routing, link scheduling and power control in multihop wireless networks 2.26 109Sprite: a simple, cheat-proof, credit-based system for mobile ad-hoc networks 2.43 88Unreliable sensor grids: coverage, connectivity and diameter 4.00 82

show the Top-5 papers in terms of two different metrics inTable 2.

Although the citation number and J-Index have positivecorrelation in general, they tend to rank some specific papersdifferently. For example, the most cited paper, “Performanceanomaly of 802.11b”, by Heusse, M. et al., is ranked sec-ond place according to J-Index. Another example is “Packetleashes: a defense against wormhole attacks in wireless net-works”, in which a novel mechanism is presented for de-fending against a severe attack in ad hoc networks calledwormhole attack. J-Index ranks this paper at 3rd place, upfrom 11th place by citation count. Suppose we consider thecitation number on Google Scholar, which is based on enor-mous data volume, as a partial ground truth, we find “Packetleashes” is actually ranked 2nd place among all papers inINFOCOM 2003 with over 1840 citations. After detailedobservation, we discover that “Packet leashes” possesses adominant position in the references of those papers whereit is cited. This explains the behavior of J-Index and furthervalidates its effectiveness in capturing paper’s novelty.

6 Conclusions & Future WorkThis paper introduces J-Index, a quantitative metric mod-eling topic-level academic influence. J-Index encodes eachpaper’s novelty and its contribution to the articles where itis cited. A generative model named Reference Topic Model(RefTM) is further proposed to recover the innovativeness of

each paper and the strength of each citation. RefTM is ableto jointly utilize the textual content and citation relationshipin scientific literatures during its training process, and thusplays a key role in the calculation of J-Index. Experiments ontwo real-world datasets demonstrate RefTM’s ability to dis-cover high-quality topics, predict citation strength and val-idate the effectiveness of J-Index for modeling topic-levelacademic influence.

There are several interesting future directions. For exam-ple, RefTM can be extended to model more inherent rela-tionship in scientific literatures such as co-authorship, co-reference and co-citation, enabling J-Index to cover moreinformation beyond word level. Another possible directionis to model the dynamics of citation network as well as J-Index. Currently, J-Index is only applicable to a static net-work and it has to be recalculated when new papers areadded or time passes by. Therefore, an online version ofRefTM as well as an explicit time component in J-Indexis able to capture influence changes in scientific literatures.Finally, we intend to develop a system such as CiteSeer inwhich J-Index can facilitate a large pool of applications likepaper ranking and academic recommendation.

ReferencesBlei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latentdirichlet allocation. the Journal of machine Learning re-search 3:993–1022.

Figure 14 : Topic Coherence Evaluation

• PMI-Score: RefTM outperforms LDA by 12% when K = 50.

• Topic Coherence-Score: RefTM outperforms LDA slightly.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 30 / 37

Page 50: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Results

Citation Strength Prediction

of more specific topics like “sentiment analysis” and “pri-vacy security”. We extract the title and abstract of each eachpaper as its textual contexts. After stop words removal, theaverage paper length of two corpora are 83 and 98 words,respectively. The distribution of paper’s citation number isshown in Figure 3.

Evaluation AspectsWe conduct the experiments to analyze the performance ofRefTM and effectiveness of J-Index from three aspects.

First, we evaluate the coherence of topics learned fromRefTM since good topic cluster performance is the foun-dation of a good understanding of topic-level academic in-fluence. Two metrics are used to assess the topic qual-ity, including PMI-Score (Newman et al. 2010) and topiccoherence-Score (Mimno et al. 2011).

Second, we adopt a prediction task to explore whetherRefTM can effectively learn the citation strength, whichis another key component of J-Index. We conduct experi-ments on the relatively small dataset in which the strengthof each citation is manually labelled. We compare the re-sults of RefTM with previous approaches and prove that ourmodel has better performance concerning the prediction ofcitation strength.

Finally, we validate the effectiveness of J-Index in termsof capturing topic-level academic influence through one casestudy. We rank all 426728 papers in the first large datasetbased on their J-Index scores and compare the results witheach paper’s corresponding assessment of research scien-tists.

The specific settings of hyper-parameters in RefTM andcomparative methods are discussed in following subsec-tions.

In all our experiments, we set � = 50/K, � = 0.01,following the convention of (Griffiths and Steyvers 2004).As for three newly-added hyper-parameters in RefTM, wegive the recommending values as � = L̄, ��n = 0.01 · N̄ ,and ��c = 0.04·N̄ , where L̄ is the average reference numberof each paper, and N̄ represents the average length of papers.

Topic Coherence AnalysisWe evaluate the coherence of each learned topic based ontwo metrics. The first one is PMI-Score, which representsthe average Pointwise Mutual Information between the mostprobable words in each topic. A larger PMI-Score indicatesthat the topic is more coherent. The calculation of PMI-Score requires external dataset such as Wikipedia Data. Weconstruct our own reference collection based on 3.34 millionscientific articles with 395.3 million word tokens to betterreflect the language usage in academic domain.

We compare the PMI-Score of topics generated by LDAand RefTM at the left hand side of Figure 4. As we can see,the PMI-Score increases as the number of topics ranges from10 to 50. Besides, we can discover that RefTM outperformsthe LDA by 12% when topic number K equals 50, while theperformance of these two models is fairly close with a smallnumber of topics.

Topic Coherence-Score is another metric to assess thetopic quality. Topic Coherence-Score depends only on inter-

nal training data, specifically the word co-occurrence statis-tics gathered from the corpus being modeled, and thus itdoes not rely on the external reference corpus like PMI-Score does. The comparative results of LDA and RefTM areshown at the right hand side of Figure 4, with number oftopic K fixed as 30. We can see RefTM outperforms LDAin terms of median, lower quartile and upper quartile.

Citation Strength PredictionWe conduct this experiment using the small superviseddatasets in which the strength of each citation is manuallyclassified into three levels, i.e., strong, middle and weak, la-beled as 1, 2 and 3, respectively. Similar to (Liu et al. 2010),we use the averaged AUC value for decision boundaries “1vs. 2, 3” and “1, 2 vs. 3” as the quality measure for predictionperformance. A larger AUC value indicates the predictionis more accurate. We compare the result with another twobaseline methods – LDA-JS and LDA-post in (Dietz, Bickel,and Scheffer 2007). We set the hyper-parameters � = 0.01,� = 50/K where the number of topics K ranges from 10to 50 in all three methods. After reducing the normalizationconstraint of RefTM in equation (9), we train each model 20times and present the result in Figure 5. Clearly, we can seeRefTM outperforms another two methods in all five scenar-ios.

Number of Topics5 10 15 20 25 30 35 40 45 50 55

AU

C

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

LDA-JSLDA-POSTRefTM

Figure 5: Citation Strength Prediction by AUC

Academic Influence ExplorationJ-Index is a metric modeling one paper’s influence rather itsquality. These two notions differ in that a paper’s influencemay change over time while its quality is fixed. To reducethe bias from different publication dates, we select a sub-set of 224 papers published on INFOCOM in the same year2003, and further adopt J-Index to measure a paper’s ownquality. We set the number of topic in RefTM to be 20, andrank each paper by its J-Index as well as citation numbers.Notice that here “citation number” actually means the num-ber of citation within the corpora, which is only a fraction ofa paper’s overall citations. Due to space limitations, we only

Figure 15 : Citation Strength Prediction measured by averaged AUC

• Reduce the normalization constraint of � in RefTM.

• RefTM clearly outperforms two baseline methods.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 31 / 37

Page 51: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Experiments Evaluation Results

Case Study: Rank INFOCOMcitation number0 5 10 15 20 25 30

Num

ber o

f Pap

ers

×104

0

2

4

6

8

10

12

14

16

18

Figure 3: Histogram of Citation Number

Topic Number K10 20 30 40 50

PMI-S

core

0

0.2

0.4

0.6

0.8

1

1.2LDARefTM

LDA RefTM

Topi

c C

oher

ence

Sco

re

-240

-220

-200

-180

-160

-140

-120

-100

Figure 4: Topic Coherence Evaluation

Table 2: Top 5 Articles in INFOCOM 2003 randked by J-Index & citaionsTitle J-Index citation countsTop 5 Articles in INFOCOM 2003 ranked by J-IndexAd hoc positioning system (APS) using AOA 6.75 115Performance anomaly of 802.11b 5.17 127Packet leashes: a defense against wormhole attacks in wireless networks 4.13 74Unreliable sensor grids: coverage, connectivity and diameter 4.00 82Sensor deployment and target localization based on virtual forces 3.61 60Top 5 Articles in INFOCOM 2003 ranked by citation numberPerformance anomaly of 802.11b 5.17 127Ad hoc positioning system (APS) using AOA 6.75 115Optimal routing, link scheduling and power control in multihop wireless networks 2.26 109Sprite: a simple, cheat-proof, credit-based system for mobile ad-hoc networks 2.43 88Unreliable sensor grids: coverage, connectivity and diameter 4.00 82

show the Top-5 papers in terms of two different metrics inTable 2.

Although the citation number and J-Index have positivecorrelation in general, they tend to rank some specific papersdifferently. For example, the most cited paper, “Performanceanomaly of 802.11b”, by Heusse, M. et al., is ranked sec-ond place according to J-Index. Another example is “Packetleashes: a defense against wormhole attacks in wireless net-works”, in which a novel mechanism is presented for de-fending against a severe attack in ad hoc networks calledwormhole attack. J-Index ranks this paper at 3rd place, upfrom 11th place by citation count. Suppose we consider thecitation number on Google Scholar, which is based on enor-mous data volume, as a partial ground truth, we find “Packetleashes” is actually ranked 2nd place among all papers inINFOCOM 2003 with over 1840 citations. After detailedobservation, we discover that “Packet leashes” possesses adominant position in the references of those papers whereit is cited. This explains the behavior of J-Index and furthervalidates its effectiveness in capturing paper’s novelty.

6 Conclusions & Future WorkThis paper introduces J-Index, a quantitative metric mod-eling topic-level academic influence. J-Index encodes eachpaper’s novelty and its contribution to the articles where itis cited. A generative model named Reference Topic Model(RefTM) is further proposed to recover the innovativeness of

each paper and the strength of each citation. RefTM is ableto jointly utilize the textual content and citation relationshipin scientific literatures during its training process, and thusplays a key role in the calculation of J-Index. Experiments ontwo real-world datasets demonstrate RefTM’s ability to dis-cover high-quality topics, predict citation strength and val-idate the effectiveness of J-Index for modeling topic-levelacademic influence.

There are several interesting future directions. For exam-ple, RefTM can be extended to model more inherent rela-tionship in scientific literatures such as co-authorship, co-reference and co-citation, enabling J-Index to cover moreinformation beyond word level. Another possible directionis to model the dynamics of citation network as well as J-Index. Currently, J-Index is only applicable to a static net-work and it has to be recalculated when new papers areadded or time passes by. Therefore, an online version ofRefTM as well as an explicit time component in J-Indexis able to capture influence changes in scientific literatures.Finally, we intend to develop a system such as CiteSeer inwhich J-Index can facilitate a large pool of applications likepaper ranking and academic recommendation.

ReferencesBlei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latentdirichlet allocation. the Journal of machine Learning re-search 3:993–1022.

Figure 16 : Citation Strength Prediction measured by averaged AUC

• Rankings by J-Index and citations number are correlated.

• J-Index favors those paper that propose novel “ideas”.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 32 / 37

Page 52: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Outline

1 Motivation

2 J-Index Framework

3 Reference Topic Model (RefTM)Generative ModelParameter Estimation

4 ExperimentsDatasetsEvaluation AspectsEvaluation Results

5 Conclusions & Future works

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 33 / 37

Page 53: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:

1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:

1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 54: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.

2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:

1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 55: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.

3 Reference Topic Model – combine citation network into topic model.

• Future works:

1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 56: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:

1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 57: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:

1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 58: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:1 RefTM in the incremental citation network.

2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 59: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Conclusions & Future works

• Conclusions:1 Model academic influence – facilitate ranking and recommendation.2 J-Index framework – consider citation strength and paper’s novelty.3 Reference Topic Model – combine citation network into topic model.

• Future works:1 RefTM in the incremental citation network.2 Consider multiple factors, especially the temporal information.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 34 / 37

Page 60: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

References

• Dietz, L.; Bickel, S.; and Sche↵er, T. 2007. Unsupervised prediction ofcitation influences. In Proceedings of the 24th international conference onMachine learning, 233 – 240. ACM.

• Liu, L.; Tang, J.; Han, J.; Jiang, M.; and Yang, S. 2010. Mining topic-levelinfluence in heterogeneous networks. In Proceedings of the 19th ACMinternational conference on Information and knowledge management, 199 –208. ACM.

• Mimno, D.; Wallach, H. M.; Talley, E.; Leenders, M.; and McCallum, A.2011. Optimizing semantic coherence in topic models. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing, 262– 272. ACM.

• Newman, D.; Lau, J. H.; Grieser, K.; and Baldwin, T. 2010. Automaticevaluation of topic coherence. In Human Language Technologies: The2010 Annual Conference of the North American Chapter of the Associationfor Computational Linguistics, 100 – 108. ACM.

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 35 / 37

Page 61: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Thank you!

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 36 / 37

Page 62: Modeling Topic-level Academic Influence in Scientific ...mickeystroller.github.io/resources/AAAI2016_slides.pdfModeling Topic-level Academic Influence in Scientific Literatures Jiaming

Conclusions & Future works

Q & A

Jiaming Shen, Zhenyu Song, Shitao Li, Zhaowei Tan, Yuning Mao, Luoyi Fu, Li Song, Xinbing Wang (Shanghai Jiao Tong University)Modeling Topic-level Academic Influence in Scientific LiteraturesFeb 13, 2016 37 / 37