Accessing the data: going beyond what the author wanted to tell you Brian McMahon International...

15
Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU, UK [email protected] Interactive Publications and the Record of Science ICSTI Winter Workshop Paris, Monday, February 8, 2010

Transcript of Accessing the data: going beyond what the author wanted to tell you Brian McMahon International...

Page 1: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Accessing the data: going beyond what the author wanted to tell you

Brian McMahonInternational Union of Crystallography5 Abbey Square, Chester CH1 2HU, [email protected]

Interactive Publications and the Record of Science

ICSTI Winter Workshop

Paris, Monday, February 8, 2010

Page 2: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

PDFs and data impoverishment

Henry Rzepa: Publishers are likely to love interactive PDF, since it is easy to archive. However ... such objects are data impoverished. Whereas with Jmol, one is obliged to provide semantically accurate data (e.g. CML or equivalent), the PDF object is simply a (pre)rendering of that data. Thus reconstituting a useful molecule from Jmol is trivial (and that reconstitution can then be used for many other purposes), reconstituting a molecule from a 3D PDF is likely to be non trivial, and will almost certainly suffer information loss compared to the original data. By all means, provide both, but I strongly urge that a 3D

PDF should not be the only object provided.

http://www.mail-archive.com/[email protected]/msg13417.html19 December 2009:

Page 3: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Jmol interactive visualizations

• Not newBiochem J. (2008). 412 399–413

• Bespoke design / implementation• Expensive• Requires consultation• Supplementary information

Page 4: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Jmol

Then (ca. 2004):• Protein structures (RasMol)• Small organic chemical molecules (Chime)

Now:• Crystal lattices (symmetry)• Inorganic materials (coordination polyhedra)• Displacement ellipsoids• Symmetry operations• Electron orbitals• Electron density maps

The right tool for the job

Page 5: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Making it easier to use

• Editing toolkit http://submission.iucr.org/jtkt

• High-quality immediate visual feedback• Context-sensitive help• Manuals, examples, tutorials• Reference: McMahon, B. & Hanson, R.M. (2008).

J. Appl. Cryst. 41, 811-814. A toolkit for publishing enhanced figures

Page 6: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Interactive molecular visualizations enhance understanding

Acta Cryst. (2008). F64, 156-162

• Rotate• Modify orientation• Alternative representations• Overlay representations• Interrogate

Page 7: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Infrastructure for publication workflow

• Server/client architecture• Ability to create interactive figures before or during

article submission/review• Opportunity for peer review/revision• Auto-generation of static equivalent• Easy generation/activation of multiple scripts to provide

alternative views

Page 8: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Requirements for routine publication of enhanced figures

• Platform independence

• Web access for authors

• Serving visualization application and data

• Integration into submission/review procedures

• Integration into journal production workflow

• Automated generation of static copy (for failsafe/PDF edition/archiving)

• Authoring tools

Page 9: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

The authoring environment

• The author uploads a data file (CIF)

• The system provides different default styles according to the type of structure

• The author edits and annotates the view

• The author may supply additional scripts

• The author saves the result as an enhanced figure + publication-quality static figure

Page 10: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Saving the enhanced figure

• Interactive applet

• Active scripts provided by the author

• High-resolution static image

• Option to view dynamic or static image online

• Link to allow peer review

Page 11: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

• Essential tool for authors• Accommodates novice and advanced users• Tabbed interface allows authors to concentrate on scientific aspects of visualization• Presets tuned to journal style requirements• Live testing, preview and feedback mechanisms

The toolkit editing interface

Page 12: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Submission/review

• Author may prepare enhanced figure ahead of publication

• Simply enter URL of edit workspace when asked to ‘upload source files’

• Presented alongside other conventional figures

• Available for peer review

• Can be edited in response to referee comments

Page 13: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Interactive authorship: publBio

http://publbio.iucr.org

• Start with the data (PDB)example 3jw1

• Add structured text• Online look-up:

• authors• references• crystallization solution components

• Validation• references

• Visualisation (Jmol)• Update data file as submission vehicle

Page 14: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Uniform (compatible) markup systems

• Crystallographic Information Framework (CIF)• Treat data/metadata, text/numerical data as peers• Domain-specific extensions (dictionaries = ontologies)• Image format

• Some data fields may need to contain richer content• Text markup• Mathematical equations• Interactive figure scripts

• Machine validation of dictionary attributes

• Methods

Page 15: Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,

Conclusions• The working scientist really wants to interact with the data• What interactive PDF offers is currently limited• Publishers should develop compatible architectures• Need domain-specific implementations (learned societies)• Investment in new applications; integration with workflow• Education for a new paradigm• Archiving

• requires more standardisation• proper compound document model• concentrate on data (or semantic content), not the implementation• ‘record not what it looks like, but what you are looking at’

• Distributed content sources• data not necessarily integral part of document• retrieval of non-discrete data sets