Elpub

24
Reinventing the Research Article - Seven Challenges in Science Publishing Anita de Waard Researcher Disruptive Technologies, Elsevier Labs NWO - Casimir Grantee, Utrecht University

Transcript of Elpub

Page 1: Elpub

Reinventing the Research Article -Seven Challenges in Science Publishing

Anita de WaardResearcher Disruptive Technologies,

Elsevier LabsNWO - Casimir Grantee,

Utrecht University

Page 2: Elpub

Seven ’known knowns’ in online science publishing:

1. The internet has caused an information overload.

2. Science papers contain facts.

3. The narrative research article is outdated and needs to be replaced.

4. Since words contain meaning,

5. And words (and logic) contain scientific fact,

6. We just need to model them with xml + rdf;

7. And the publishers should stop making all these papers.

Page 3: Elpub

- My own experience (as a researcher):

- Easy: find what I know exists

- OK: Finding things I expect hope exist

- Hard: making sure I haven’t missed anything

- However, none of these make me feel overwhelmed.

- Infuriating:

- Trying to respond to people who ask me something

- Managing three email accounts on 4 computers

- Following up on plans and projects

- However, we can improve the delivery of science content online.

1. The internet has caused an information overload

Page 4: Elpub

1. The internet has caused an information overload1. The internet has caused an information overload

- Pick (carve out) a first set of user needs, e.g.:

- Locate

- Understand

- Believe (Be convinced)

- Explore

- But this does not address WHAT you want to Locate, Understand, ..

- Semantic network in pharmacology: ‘Grey out what I already know’

1. How can we model a user’s interest?

Page 5: Elpub

2. Science papers contain facts

- With FEBS Letters Editorial Office in Heidelberg/MINT Database in Rome

- Structured Digital Abstract [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’

- For FEBS: provide proteins, methods, protein-protein interactions, as given in MINT:

- 2008: authors provide, editors check

- 2009: Word Plug-in tool suggests, authors (and editors) check

2. Science papers contain facts

2. Can we create an ontology of doubt?

Page 6: Elpub

2. Science papers contain facts

2. Can we create an ontology of doubt?

Page 7: Elpub

3. The narrative RA should be replaced Aristotle Quintilian Cell APA Style Guide

prooimion Introduction exordiumThe introduction of a speech, where one announces the subject and purpose

of the discourse, and where one usually employs the persuasive appeal of ethos in order to establish credibility with the audience.

Introduction Introduction

prothesis Statement of Facts narratio

The second part of a classical oration, following the introduction or exordium. The speaker here provides a narrative account of what has happened and

generally explains the nature of the case. Quintilian adds that the narratio is followed by the propositio, a kind of summary of the issues or a statement of

the charge.

Introduction Introduction

Summary propostitioComing between the narratio and the partitio of a classical oration, the

propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.

Abstract Abstract

Division/outline partitio

Following the statement of facts, or narratio, comes the partitio or divisio. In this section of the oration, the speaker outlines what will follow, in accordance with what's been stated as the status, or point at issue in the case. Quintilian suggests the partitio is blended with the propositio and also assists memory.

Table of Contents Article Outline

pistis Proof confirmatioFollowing the division / outline or partitio comes the main body of the speech

where one offers logical arguments as proof. The appeal to logos is emphasized here.

Results Methods, Results

Refutation refutatioFollowing the the confirmatio or section on proof in a classical oration, comes the refutation. As the name connotes, this section of a speech was devoted to

answering the counterarguments of one's opponent.Discussion Discussion

epilogos peroratioFollowing the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a

summing up (see the figures of summary, below).Discussion Discussion

Page 8: Elpub

3. The narrative RA should be replaced 3. The narrative RA should be replaced

- Narrative is how stories are told; ‘the truth can onlly be told in stories’....

- Narrative is essential for persuasion

3. How can we represent narrative online?

The Story of Goldilocks and the Three Bears

Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins

Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.

a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest. Pretty soon, she came upon a house.

Location Experimental setup

studied and compared in vivo effects and interactions to those of the human protein

She knocked and, when no one answered,

Goal Theme Researchgoal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.

she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there were three bowls of porridge.

Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files

Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain

She tasted the porridge from the first bowl.

Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she exclaimed.

Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cellsSo, she tasted the porridge

from the second bowl.  Data (data not shown),

This porridge is too cold, she said

Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the last bowl of porridge.

  Data (Figures 1B-1D).

Ahhh, this porridge is just right, she said happily and

Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles

she ate it all up.   Data (Figure 1F),

Page 9: Elpub

4. Words contain meaning

Sicilian?

- ‘A word is worth a thousand pictures’ (Don Loritz)

- The meaning of words occurs in context and is dependent on knowledge and experience

- This is even more so in science:PSA = Prostate-Specific Antigen or Pot Smokers Association of America?

Page 10: Elpub

4. Words contain meaning

- Cognitive linguistics: language and cognition cannot be separated - language acts are cognitive acts

- Lakoff, metaphor: ‘anger is heat’

- Meaning is created in the mind:a word is not (only) a ‘particle’ but (also) a ‘wave’:Hearing/reading is not unpacking a package, but resonating at a specific frequency - context is its medium - context-free language does not exist!

4. How do we model cognitive context?

Page 11: Elpub

11

5. Words (and logic) contain scientific fact

Figure 1. Initiation and Maintenance of G1 Arrest Induced by IR(A) Stable MCF-7 clones containing either pCDNA3.1 (Neo) or pCDNA3.1-E6 were irradiated (20 Gy), and cellular protein extracts were made 2 hr later, separated on 10% SDS PAGE, and immunoblotted to detect p53 and cyclin D1 proteins.

“In the presence of E6, p53 stabilization in response to IR

was almost completely prevented in MCF-7 cells

(Figure 1A).”

“We generated an MCF-7 derivative that expresses the

HPV16 E6 protein, which mediates degradation of p53

([24]).”

24. M. Scheffner, B.A. Werness, J.M. Huibregtse, A.J. Levine and P.M. Howley, The E6 oncoprotein encoded by human

papillomavirus types 16 and 18 promotes the degradation of p53. Cell 63 (1990), pp. 1129–1136. SummaryPlus | Full Text + Links | PDF (1728 K) | Abstract + References in Scopus |

Cited By in Scopus

• “[Y]ou can transform a fact into fiction or a fiction into fact just by adding or subtracting references [and data]” – Bruno Latour, ‘Science in Action’,1987

Page 12: Elpub

5. Words (and logic) contain scientific fact

- Main goal of article is to persuade

- The author is a medium that enables the article to get itself published (a la selfish gene/meme)

- Essential persuasive elements are non-textual

5. How do we represent non-textual elements?

5. Words (and logic) contain scientific fact

Page 13: Elpub

Discourse Segments- “A text is made up of Discourse Segments

and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales

- Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal.

- Define for Biological Research Article

Page 14: Elpub

A model of a biology research article:

<EXPERIMENTS> <Experiment> <Header header="h1">p53-Independent Initiation of G1 Arrest Induced by IR</Header> <Fact fact="fa1" factref="br26">Since the transcriptional response by p53 is a relatively slow process,</Fact> <Problem problem="p1">we asked whether initiation of a G1 arrest following genotoxic stress requires p53. </Problem> <Method method="m1">We generated an MCF-7 derivative </Method> <Fact fact="fa2" factref="br24">that expresses the HPV16 E6 protein, which mediates degradation of p53(<Bibref bib="br24">[24]</Bibref>).</Fact><Result result="r1">In the presence of E6, p53 stabilization in response to IR was almost completely prevented in MCF-7 cells (<Figref figref="agami1.gif">Figure 1A).</Figref></Result><Result result="r2">Consistent with this, no induction of p21cip1 by IR was seen in the E6-expressing MCF-7 cells <Figref figref="none.gif">(data not shown).</Figref></Result>...

6. Just model the facts with xml + rdf

Page 15: Elpub

12

Page 16: Elpub

Introduction Method Results Discussion Total

Fact 63 0 104 37 204

Problem 20 0 10 15 45

Goal 2 0 72 6 80

Method 2 all 129 6 137

Result 10 0 230 44 284

Implication 14 0 100 36 150

Hypothesis 10 0 33 26 69

Total 121 0 678 170 969

Segments vs. Sections

Page 17: Elpub

Fact Problem Goal Method Result Implication Hypothesis Total

Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% 265

Present passive

5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% 23

Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% 228

Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% 72

Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% 6

Imperfect: "to" 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% 90

Gerund ("ing") 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% 114

Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100% 798

Segment Tense

Page 18: Elpub

Fact Hypothesis Problem Goal Method Result Implication End Total

Start 18 3 1 8 2 2 4 0 38

Fact 83 22 13 17 9 31 12 1 188

Hypothesis 20 5 3 7 6 2 6 3 52

Problem 9 7 7 2 3 5 3 3 39

Goal 7 0 2 4 46 6 0 0 65

Method 13 2 3 10 25 54 3 0 110

Result 23 9 4 6 16 85 78 6 227

Implication 13 6 4 12 11 30 12 25 113

Total 186 54 37 61 118 215 118 38 827

Segment order

Page 19: Elpub

goal

to

hypothetical realm: (might, would)

realm of activity: (to test, to see)

realm of models: present

realm of experience:

past

wemethod

resultresulting in

Discourse: A Fact(ory)

suggests that

implication

discussion

Own viewShared view

hypothesis

fact fact fact

incongruity/ignorance

problem

introduction

results

discussion

Page 20: Elpub

6. Just model the facts with xml + rdf6. Just model the facts with xml + rdf6. Just model the facts with xml + rdf

- In practice: ScienceDirect does not use our XML... (shhh....)

- At Elsevier: Project Harpoon: ‘stab’ the document with metadata, asynchronous, linked in (XPath/XQuery), distributed

- In XML - how to access a phrase inside an article:

- access inside a PDF by coordinates? Format, content changes

- add IDs to every single element? Format, content, version changes?

- How to represent relations, even if we know where they link?

6. How can we better model discourse elements (and relations)?

6. Just model the facts with xml + rdf

Page 21: Elpub

7. And publishers should stop making all those papers. - 6 uses of a RA:

- job application - report card- thesis- conference tickets- research assessment- and yes, by the way, reporting on scientific work.

- Scientists are evaluated largely based on publications: this enables their production to be evaluated by non-specialists

- This places an undue stress on quantity, conformity (for risk of being rejected), publishing for its own sake.

7. How can we disentangle communication and evaluation?

7. And publishers should stop making all those papers.

Page 22: Elpub

Seven ‘Known Unknowns’ in Online Science Publishing

1. How can we model a user’s interest?

2. Can we create an ontology of doubt?

3. How can we represent narrative online?

4. How do we model cognitive context?

5. How do we represent non-textual elements?

6. How can we better model discourse elements and relations?

7. How can we disentangle communication and evaluation?

Page 23: Elpub

The Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences is a contest created to improve the way scientific information is communicated and used. The contest invites members of the scientific community to describe and prototype a tool to improve the interpretation and identification of meaning in (online) journals and text databases relating to the life sciences.

Specifically we are looking for new ways to: 1. improve the process/methods/results of creating, reviewing and editing scientific content 2. interpret, visualize or connect the knowledge more effectively, and/or 3. provide tools/ideas for measuring the impact of these improvements.

Abstracts are now invited - Submissions will close on July 15th, 2008.

-Finalists will be invited to present their vision papers in a public symposium, at which the Panel of Judges will announce the winners. -The first place winner will be awarded a cash prize of US$35,000 -The second place winner a cash prize of US$15,000. -All finalists will receive free trial access to ScienceDirect and Scopus for a year.

http://www.elseviergrandchallenge.com/

Page 24: Elpub

Unknown unknowns?

Would you care to correct/contradict/join me?

Anita de Waard,

http://people.cs.uu.nl/anita

[email protected].