Elpub
-
Upload
anita-de-waard -
Category
Education
-
view
350 -
download
2
Transcript of Elpub
Reinventing the Research Article -Seven Challenges in Science Publishing
Anita de WaardResearcher Disruptive Technologies,
Elsevier LabsNWO - Casimir Grantee,
Utrecht University
Seven ’known knowns’ in online science publishing:
1. The internet has caused an information overload.
2. Science papers contain facts.
3. The narrative research article is outdated and needs to be replaced.
4. Since words contain meaning,
5. And words (and logic) contain scientific fact,
6. We just need to model them with xml + rdf;
7. And the publishers should stop making all these papers.
- My own experience (as a researcher):
- Easy: find what I know exists
- OK: Finding things I expect hope exist
- Hard: making sure I haven’t missed anything
- However, none of these make me feel overwhelmed.
- Infuriating:
- Trying to respond to people who ask me something
- Managing three email accounts on 4 computers
- Following up on plans and projects
- However, we can improve the delivery of science content online.
1. The internet has caused an information overload
1. The internet has caused an information overload1. The internet has caused an information overload
- Pick (carve out) a first set of user needs, e.g.:
- Locate
- Understand
- Believe (Be convinced)
- Explore
- But this does not address WHAT you want to Locate, Understand, ..
- Semantic network in pharmacology: ‘Grey out what I already know’
1. How can we model a user’s interest?
2. Science papers contain facts
- With FEBS Letters Editorial Office in Heidelberg/MINT Database in Rome
- Structured Digital Abstract [Gerstein et. al]: ‘machine-readable XML summary of pertinent facts’
- For FEBS: provide proteins, methods, protein-protein interactions, as given in MINT:
- 2008: authors provide, editors check
- 2009: Word Plug-in tool suggests, authors (and editors) check
2. Science papers contain facts
2. Can we create an ontology of doubt?
2. Science papers contain facts
2. Can we create an ontology of doubt?
3. The narrative RA should be replaced Aristotle Quintilian Cell APA Style Guide
prooimion Introduction exordiumThe introduction of a speech, where one announces the subject and purpose
of the discourse, and where one usually employs the persuasive appeal of ethos in order to establish credibility with the audience.
Introduction Introduction
prothesis Statement of Facts narratio
The second part of a classical oration, following the introduction or exordium. The speaker here provides a narrative account of what has happened and
generally explains the nature of the case. Quintilian adds that the narratio is followed by the propositio, a kind of summary of the issues or a statement of
the charge.
Introduction Introduction
Summary propostitioComing between the narratio and the partitio of a classical oration, the
propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation.
Abstract Abstract
Division/outline partitio
Following the statement of facts, or narratio, comes the partitio or divisio. In this section of the oration, the speaker outlines what will follow, in accordance with what's been stated as the status, or point at issue in the case. Quintilian suggests the partitio is blended with the propositio and also assists memory.
Table of Contents Article Outline
pistis Proof confirmatioFollowing the division / outline or partitio comes the main body of the speech
where one offers logical arguments as proof. The appeal to logos is emphasized here.
Results Methods, Results
Refutation refutatioFollowing the the confirmatio or section on proof in a classical oration, comes the refutation. As the name connotes, this section of a speech was devoted to
answering the counterarguments of one's opponent.Discussion Discussion
epilogos peroratioFollowing the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a
summing up (see the figures of summary, below).Discussion Discussion
3. The narrative RA should be replaced 3. The narrative RA should be replaced
- Narrative is how stories are told; ‘the truth can onlly be told in stories’....
- Narrative is essential for persuasion
3. How can we represent narrative online?
The Story of Goldilocks and the Three Bears
Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
She went for a walk in the forest. Pretty soon, she came upon a house.
Location Experimental setup
studied and compared in vivo effects and interactions to those of the human protein
She knocked and, when no one answered,
Goal Theme Researchgoal
Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there were three bowls of porridge.
Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from the first bowl.
Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1.
This porridge is too hot! she exclaimed.
Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cellsSo, she tasted the porridge
from the second bowl. Data (data not shown),
This porridge is too cold, she said
Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days
So, she tasted the last bowl of porridge.
Data (Figures 1B-1D).
Ahhh, this porridge is just right, she said happily and
Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also induces eye abnormalities. The external structures of the eyes that overexpress dAtx-1 show disorganized ommatidia and loss of interommatidial bristles
she ate it all up. Data (Figure 1F),
4. Words contain meaning
Sicilian?
- ‘A word is worth a thousand pictures’ (Don Loritz)
- The meaning of words occurs in context and is dependent on knowledge and experience
- This is even more so in science:PSA = Prostate-Specific Antigen or Pot Smokers Association of America?
4. Words contain meaning
- Cognitive linguistics: language and cognition cannot be separated - language acts are cognitive acts
- Lakoff, metaphor: ‘anger is heat’
- Meaning is created in the mind:a word is not (only) a ‘particle’ but (also) a ‘wave’:Hearing/reading is not unpacking a package, but resonating at a specific frequency - context is its medium - context-free language does not exist!
4. How do we model cognitive context?
11
5. Words (and logic) contain scientific fact
Figure 1. Initiation and Maintenance of G1 Arrest Induced by IR(A) Stable MCF-7 clones containing either pCDNA3.1 (Neo) or pCDNA3.1-E6 were irradiated (20 Gy), and cellular protein extracts were made 2 hr later, separated on 10% SDS PAGE, and immunoblotted to detect p53 and cyclin D1 proteins.
“In the presence of E6, p53 stabilization in response to IR
was almost completely prevented in MCF-7 cells
(Figure 1A).”
“We generated an MCF-7 derivative that expresses the
HPV16 E6 protein, which mediates degradation of p53
([24]).”
24. M. Scheffner, B.A. Werness, J.M. Huibregtse, A.J. Levine and P.M. Howley, The E6 oncoprotein encoded by human
papillomavirus types 16 and 18 promotes the degradation of p53. Cell 63 (1990), pp. 1129–1136. SummaryPlus | Full Text + Links | PDF (1728 K) | Abstract + References in Scopus |
Cited By in Scopus
• “[Y]ou can transform a fact into fiction or a fiction into fact just by adding or subtracting references [and data]” – Bruno Latour, ‘Science in Action’,1987
5. Words (and logic) contain scientific fact
- Main goal of article is to persuade
- The author is a medium that enables the article to get itself published (a la selfish gene/meme)
- Essential persuasive elements are non-textual
5. How do we represent non-textual elements?
5. Words (and logic) contain scientific fact
Discourse Segments- “A text is made up of Discourse Segments
and the relations between them” - Grosz and Sidner, Mann-Thomson, Marcu, Swales
- Discourse Segment Purpose: element that has a consistent rhetorical/pragmatic goal.
- Define for Biological Research Article
A model of a biology research article:
<EXPERIMENTS> <Experiment> <Header header="h1">p53-Independent Initiation of G1 Arrest Induced by IR</Header> <Fact fact="fa1" factref="br26">Since the transcriptional response by p53 is a relatively slow process,</Fact> <Problem problem="p1">we asked whether initiation of a G1 arrest following genotoxic stress requires p53. </Problem> <Method method="m1">We generated an MCF-7 derivative </Method> <Fact fact="fa2" factref="br24">that expresses the HPV16 E6 protein, which mediates degradation of p53(<Bibref bib="br24">[24]</Bibref>).</Fact><Result result="r1">In the presence of E6, p53 stabilization in response to IR was almost completely prevented in MCF-7 cells (<Figref figref="agami1.gif">Figure 1A).</Figref></Result><Result result="r2">Consistent with this, no induction of p21cip1 by IR was seen in the E6-expressing MCF-7 cells <Figref figref="none.gif">(data not shown).</Figref></Result>...
6. Just model the facts with xml + rdf
12
Introduction Method Results Discussion Total
Fact 63 0 104 37 204
Problem 20 0 10 15 45
Goal 2 0 72 6 80
Method 2 all 129 6 137
Result 10 0 230 44 284
Implication 14 0 100 36 150
Hypothesis 10 0 33 26 69
Total 121 0 678 170 969
Segments vs. Sections
Fact Problem Goal Method Result Implication Hypothesis Total
Present active 72 46% 27 60% 15 23% 7 7% 37 16% 69 51% 38 55% 265
Present passive
5 3% 2 4% 2 3% 1 1% 1 0% 11 8% 1 1% 23
Past active 18 11% 5 11% 11 17% 48 47% 122 54% 16 12% 8 12% 228
Past passive 25 16% 2 4% 1 2% 17 17% 21 9% 1 1% 5 7% 72
Future 2 1% 3 7% 0 0% 0 0% 1 0% 0 0% 0 0% 6
Imperfect: "to" 13 8% 2 4% 32 50% 2 2% 20 9% 14 10% 7 10% 90
Gerund ("ing") 22 14% 4 9% 3 5% 28 27% 23 10% 24 18% 10 14% 114
Total 157 100% 45 100% 64 100% 103 100% 225 100% 135 100% 69 100% 798
Segment Tense
Fact Hypothesis Problem Goal Method Result Implication End Total
Start 18 3 1 8 2 2 4 0 38
Fact 83 22 13 17 9 31 12 1 188
Hypothesis 20 5 3 7 6 2 6 3 52
Problem 9 7 7 2 3 5 3 3 39
Goal 7 0 2 4 46 6 0 0 65
Method 13 2 3 10 25 54 3 0 110
Result 23 9 4 6 16 85 78 6 227
Implication 13 6 4 12 11 30 12 25 113
Total 186 54 37 61 118 215 118 38 827
Segment order
goal
to
hypothetical realm: (might, would)
realm of activity: (to test, to see)
realm of models: present
realm of experience:
past
wemethod
resultresulting in
Discourse: A Fact(ory)
suggests that
implication
discussion
Own viewShared view
hypothesis
fact fact fact
incongruity/ignorance
problem
introduction
results
discussion
6. Just model the facts with xml + rdf6. Just model the facts with xml + rdf6. Just model the facts with xml + rdf
- In practice: ScienceDirect does not use our XML... (shhh....)
- At Elsevier: Project Harpoon: ‘stab’ the document with metadata, asynchronous, linked in (XPath/XQuery), distributed
- In XML - how to access a phrase inside an article:
- access inside a PDF by coordinates? Format, content changes
- add IDs to every single element? Format, content, version changes?
- How to represent relations, even if we know where they link?
6. How can we better model discourse elements (and relations)?
6. Just model the facts with xml + rdf
7. And publishers should stop making all those papers. - 6 uses of a RA:
- job application - report card- thesis- conference tickets- research assessment- and yes, by the way, reporting on scientific work.
- Scientists are evaluated largely based on publications: this enables their production to be evaluated by non-specialists
- This places an undue stress on quantity, conformity (for risk of being rejected), publishing for its own sake.
7. How can we disentangle communication and evaluation?
7. And publishers should stop making all those papers.
Seven ‘Known Unknowns’ in Online Science Publishing
1. How can we model a user’s interest?
2. Can we create an ontology of doubt?
3. How can we represent narrative online?
4. How do we model cognitive context?
5. How do we represent non-textual elements?
6. How can we better model discourse elements and relations?
7. How can we disentangle communication and evaluation?
The Elsevier Grand Challenge: Knowledge Enhancement in the Life Sciences is a contest created to improve the way scientific information is communicated and used. The contest invites members of the scientific community to describe and prototype a tool to improve the interpretation and identification of meaning in (online) journals and text databases relating to the life sciences.
Specifically we are looking for new ways to: 1. improve the process/methods/results of creating, reviewing and editing scientific content 2. interpret, visualize or connect the knowledge more effectively, and/or 3. provide tools/ideas for measuring the impact of these improvements.
Abstracts are now invited - Submissions will close on July 15th, 2008.
-Finalists will be invited to present their vision papers in a public symposium, at which the Panel of Judges will announce the winners. -The first place winner will be awarded a cash prize of US$35,000 -The second place winner a cash prize of US$15,000. -All finalists will receive free trial access to ScienceDirect and Scopus for a year.
http://www.elseviergrandchallenge.com/
Unknown unknowns?
Would you care to correct/contradict/join me?
Anita de Waard,
http://people.cs.uu.nl/anita