376 sspin2011 bradleyallen

Innovation and the STM publisher of the future

Bradley P. Allen, Elsevier Labs

Innovation Session, SSP IN Conference 2011

Arlington, VA, USA

2011-09-19

Peak physical media

• “Music Sales”, New York Times, 1 August 2009. http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html

• “Initial Circs per student”, William Denton, 31 January 2011. http://www.miskatonic.org/2011/01/31/initial-circs-student

• “Rise of e-book Readers to Result in Decline of Book Publishing Business”, Steven Mather, iSuppli, 28 April 2011. http://www.isuppli.com/Home-and-Consumer-Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book-Publishing-Business.aspx 2

http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html

http://www.nytimes.com/imagepages/2009/08/01/opinion/01blow.ready.html

http://www.miskatonic.org/2011/01/31/initial-circs-student






http://www.isuppli.com/Home-and-Consumer-Electronics/News/Pages/Rise-of-e-book-Readers-to-Result-in-Decline-of-Book-Publishing-Business.aspx


































A simple model of the evolution of publishing

Print era: 1600s -1980

• Packaged as books and articles

• Physically distributed

• Access and discovery through libraries

Digital Library era: 1980 – 2010s

• Packaged as books and articles

• Digitally distributed

• Access and discovery through search engines

Platform-as-a-Service era: 2010s

• Packaged as apps and APIs

• Digitally distributed

• Access and discovery through social networks

3

Facets of STM publishing in the PaaS era

AcquisitionExtract, Load

and Transform

Enhancement IndexingDiscovery and

AccessComposition Delivery

Submitting

Crawling

Syndicating

Formatting

Mapping

Cleansing

Indexing

Querying

Updating

Storing

Annotating

Subject tagging

Classification

Entity recognition

Author

Supplier

Web site

Typesetter

Automated process

Subject matter expert

Search engine

Content repository

Entity registry

Product catalog

Editor

Reviewer

User

Designer

Developer

E-book

Mobile app

Mobile-enhanced Web site

API

Entity extraction

Fact extraction

Clustering

Aggregating

Ordering

Summarizing

Filtering

Analysis

Data science

Rendering

Design

Publishing

Accessing

Retrieving

Deleting

Entity Activity

Process Type

Article

Book

Media object

Entity record

Asset metadata

Relational metadata

Provenance metadata

Usage metadata

Taxonomy

Ontology

User-generated content

Content Type

4

STM publishing as business intelligence

Surajit Chaudhuri, Umeshwar Dayal, and Vivek Narasayya. 2011. An overview of business intelligence technology. Commun.

ACM 54, 8 (August 2011), 88-98. http://doi.acm.org/10.1145/1978542.1978562

5

http://doi.acm.org/10.1145/1978542.1978562

http://doi.acm.org/10.1145/1978542.1978562

Some scenarios to compare the two digital eras

Scenario Digital Library era Platform-as-a-service era

A new medical term relevant to an emerging healthcare issue (e.g. a new type of avian flu virus) needs to be incorporated into a search index immediately

Organizational governance issues about how taxonomies are be updated, coupled with manually-intensive workflows and ad-hocapproaches to content tagging, inhibit rapid response

A single, automated and standardized taxonomy management and content enhancement workflow allows rapid and timely update of search applications

Application developers want to mash up epidemiological data with medical journal articles to create topic-specific Web resource

Data silos without easy means of programmatic access by developers, coupled with governance and business model questions , inhibit data reuse

Content API and single-point-of-access repository allow data and content to be accessed, discovered and reused across multiple applications

Digital library developers want to stagecontent into single repository for unified search index generation

Duplication of core content leads to synchronization, quality control issues

Consolidation of duplicate repositories into a single point of truth across all content accessible and discoverable through a Content API eliminates the need forduplication and synchronization

Third party solutions providers want to integrate content (e.g. tagged medical journal articles, medical taxonomies) into point-of-care solutions

No standards, no APIs for point-of-care content integration across all content and data

Standards and APIs that scale across multiple partners, for all content types, for all delivery formats

Publishers want to deliver their content to tablets and e-readers in delivery formats that take advantage of the displays and interaction modalities on those devices

No clear standard or approach for targeting emerging eReader, tablet devices, multipleand divergent approaches leading to siloedsolutions, duplication of effort

Web- and industry-standards for eReader, tablet devices supported as part of standard automated processing into delivery channel-specific formats, regularly updated and exposed through a Content API

Journal publisher wants to integrate content enhancements across multiple subject matter areas to add value to products leveraging Article of the Future technology

No single point of access to content enhancements, no standards for contentenhancement suppliers and partners to deliver enhancements for integration

Easy access to multiple opportunities for content enhancements embedded in standard next-generation article formats and provided using standard content enhancement formats

6

• Craft content acquisition, production and management systems that support with equal capability and flexibility a broad range of content types and delivery channels

• Make it easy for authors, editors and reviewers to work with bundles of content and data in the aggregate

• Make it easy to discover and access, across all content assets, information in fragments smaller than the unit of publication

• Then make it easy to aggregate and compose these fragments into new products and services

• Leverage the tremendous power of Web architectural standards and formats to increase the ease of content integration and interoperability

Goals for the publisher of the future

7

• Broad range of content types– Must treat as first-class objects video, audio,

images, datasets, metadata and knowledge organization systems in addition to articles and books

• Standards-based– Web-standard formats to support ease of

integration and interoperability

• Fine-grained– Must be decomposable into and addressable in

fragments smaller than the unit of publication; e.g., down to the level of specific words, phrases, images, table cells in articles or book chapters, key frames and segments in videos

• Discoverable– Must be easily located across all levels of

granularity,

• Accessible– Must be easily accessed through content

creation, retrieval, update and deletion (CRUD) services

• Flexible– New content types and associated schemas

must be easily added through configuration

• Reusable– It must be efficient for product developers to

aggregate and compose content fragments into new products

• Modifiable– Support the enhancement and correction of

content at any time following creation

• Broad range of delivery formats– Content standards and services must support

fulfillment, delivery and presentation across desktop, notebook, tablet and mobile computing devices

New requirements for content management

8

Leveraging Web standards for sharing

1. Use URIs to name things

2. Use HTTP URIs so they can be looked up

3. Return useful data when things are looked up

4. Include links to other things in the returned data

“Linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.”

Tennison J, 2010. Why Linked Data for data.gov.uk?

http://www.jenitennison.com/blog/node/140

Shotton D, Portwin K, Klyne G, Miles A, 2009. Adventures in Semantic Publishing:

Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol 5(4):

e1000361. doi:10.1371/journal.pcbi.1000361 9

Relational Metadata

Relational Metadata

Relational Metadata

Relational metadata

10

From books and articles to evolving research objects

Linked data

Acquire

Transform,

Enhance, Compose

Deliver

Article

Entity record

Media object

Relational metadata

Relational metadata

Relational metadata

• Emergent technologies driven by consumer Web applications emphasize design choices that focus on delivering cheap, robust and scalable Web applications– Schemaless document stores provide read/write at Web scale with

support for analytics• For more dynamic, fine-grained content and linked data• For easier usage and citation analysis, bibliometrics and scientometrics

– Web application development frameworks that leverage HTML5/CSS/JS to deliver across desktops, notebooks, tablets and smartphones

– Deploying in the cloud and moving scale-out from development to operations to reduce time-to-market, cost of failure for emerging, niche publishing opportunities

• As we shift to the Platform-as-a-Service era, these features become an important part of the STM publishing technology stack

Leveraging consumer Web innovations

11

Examples from Elsevier: Linked Data Repository

12

Examples from Elsevier: SciVal

13

Examples from Elsevier: SciVerse

14

• This stuff is not just for big publishers

• These are the tools that new consumer Internet businesses are using to create new products and services today… quickly and on the cheap

• Smaller publishers and societies can use lean startup techniques to drive app and API design and development starting from existing web presences and third-party APIs

The publisher of the future as lean startup

15

Example: Impact metrics in Klout

16

Example: Content acquisition using Github

17

Example: SciVerse/Mendeley integration

18

• When content can be mashed up at a fine-level of granularity using multiple third-party APIs, what are the rights associated with the resulting product? What are the appropriate business models?

• What standards should there be for research objects?

• Who gets credit for research objects? How is impact determined and reputation managed?

• What is an acceptable trade off between content flexibility and high-touch presentation design?

Challenges for the publisher of the future

19

• STM publishing is only beginning the transition from print to online

• Articles and books are no longer sufficient containers for scholarly communication

• Tools to effect this change come from the consumer Internet and the business intelligence worlds

• Publishers of the future will leverage the best practices emerging around these tools to create innovative new products to serve their communities

In summary

20

376 sspin2011 bradleyallen

Documents

Transcript of 376 sspin2011 bradleyallen