Dynamic Potential of Semantic Enrichment

49
The Dynamic Potential of Semantic Enrichment Allen Press Emerging Trends in Scholarly Publishing™ Seminar 14 April 2011 Pam Harley VP, Product & Market Development Semedica TM A DIVISION OF SILVERCHAIR [email protected] (434) 296-6333 x372 or, Everything You Always Wanted to Know About Semantic Enrichment OK, not everything. Not even most things. Just some things you probably should be aware of.

description

Presented at the 2011 Allen Press Emerging Trends in Scholarly Publishing Seminar, 14 April 2011, Washington, DC

Transcript of Dynamic Potential of Semantic Enrichment

Page 1: Dynamic Potential of Semantic Enrichment

The Dynamic Potential of Semantic Enrichment

Allen Press Emerging Trends in Scholarly Publishing™ Seminar 14 April 2011

Pam Harley VP, Product & Market Development SemedicaTM A DIVISION OF SILVERCHAIR [email protected] (434) 296-6333 x372

or, Everything You Always Wanted to Know About Semantic Enrichment

OK, not everything. Not even most things. Just some things you probably should be aware of.

Page 2: Dynamic Potential of Semantic Enrichment

Why me?

Me 20+ years in STM publishing, many hats worn

print, digital

books, journals, news, continuing education…

editorial, production, product development

Silverchair 10+ years working with STM publishers to build products

and features from semantically tagged content

2

Page 3: Dynamic Potential of Semantic Enrichment

Here’s the plan

WHAT is semantic enrichment

WHY you should care (benefits)

HOW to get started

(with a few side trips to make sure we’re all on the same page re: lingo)

3

Page 4: Dynamic Potential of Semantic Enrichment

First…

DON’T

do what I’m about to do Don’t start by exploring technology

(Hint: Start with user stories)

4

Page 5: Dynamic Potential of Semantic Enrichment

What’s a user story?

a user story captures what the user wants to achieve—who wants the functionality

and why it allows that user to achieve something useful

5

Page 6: Dynamic Potential of Semantic Enrichment

Creating user stories

Focus your tagging strategy on user stories—how people want to use your content:

What tasks are they trying to do when they use your

product? What answers are they looking for? At what point

in their workflow is your product used?

Almost all information sites have multiple user stories. Know them for your products

Remember that your organization is also a key

user of your product

6

Page 7: Dynamic Potential of Semantic Enrichment

WHAT is semantic…

enrichment

tagging

markup

indexing

fingerprinting

classification

categorization

?

7

Page 8: Dynamic Potential of Semantic Enrichment

Semantics are about meaning The meaning of content is currently written for

human understanding, not computers

Semantics adds a layer of meaning to your content, so that computers can make sense of it and build connections to it

Semantic metadata answers the most important question of all for content producers and users:

What is this content about?

captured in a way that computers can process

8

Page 9: Dynamic Potential of Semantic Enrichment

“Atomizing” information

A semantic approach requires you to go beyond documents and think of your content as data

Semantic markup allows knowledge in your publications to be acted on as distinct bits of data

For example:

1 practice guideline = 1 document OR 1 practice guideline = 312 distinct pieces of data

9

Page 10: Dynamic Potential of Semantic Enrichment

Taxonomy is the semantic foundation Taxonomy is the framework for the semantic layer

and semantic tagging

It allows… Normalization

Consistency in tagging

Concept grouping and hierarchical relationships

Integrations/interoperability (internal and external)

10

Page 11: Dynamic Potential of Semantic Enrichment

Equivalent relationships are critical Synonyms, abbreviations, jargon, misspellings,

codes are a critical component

Necessary to normalize the natural and constantly evolving variations in the language that authors use to describe concepts and searchers use to find them

Vastly improve performance of autotagging systems

Precise strings are easier to match programmatically, and a thesaurus magnifies the number of strings available to match to a given concept

11

Page 12: Dynamic Potential of Semantic Enrichment

Normalization

Authors use different terminology to represent the same topics

Examples: Synonyms (newborn = neonate) Abbreviations (GHB = gamma hydroxybutyrate) Shorthand (c diff = clostridium difficile)

Searches for these language variations produce different results

A semantic layer controlled by a taxonomy/ thesaurus normalizes these variations

12

Page 13: Dynamic Potential of Semantic Enrichment

Normalization in action at McGraw-Hill’s AccessEmergency Medicine

13

Page 14: Dynamic Potential of Semantic Enrichment

Consistency in tagging

14

Page 15: Dynamic Potential of Semantic Enrichment

Dynamic concept grouping and hierarchical relationships

15

Page 16: Dynamic Potential of Semantic Enrichment

Hooks for integrations/ interoperability

16

Page 17: Dynamic Potential of Semantic Enrichment

Where does a taxonomy come from? Your content collection

Inputs from your users (e.g., author keywords, search logs)

Subject matter expert consultation

Industry standard terminologies

Source for concepts, equivalents, guidance on hierarchy

17

Page 18: Dynamic Potential of Semantic Enrichment

The importance of industry standard terminologies

Your taxonomy must be able to interact with standards of your domain to forge meaningful external integrations

Many terminologies are in use in different scientific domains (UMLS, ACS, ACM, AIP, IEEE, OSA, EPA, NASA, USGS…). Investigate what’s available

Great case example for domain-level taxonomy:

For medical content, UMLS metathesaurus maps together 100+

constituent health care vocabularies (MeSH, SNOMED, ICD,

RxNorm…) to support health care interoperability

18

Page 19: Dynamic Potential of Semantic Enrichment

Don’t reinvent the wheel!

If there’s a taxonomy available that’s a good fit, use it

BUT make sure you have a mechanism for adapting it to meet the needs of your content your users the pace of change/new concepts in your field

[Note to STM publishers in cutting-edge areas: You can’t wait for the standards to catch up to your research output—you’ll need to be able to add concepts at the time of publication]

19

Page 20: Dynamic Potential of Semantic Enrichment

Ongoing taxonomy management

Taxonomies must be continually enhanced as

your domain evolves, your content set grows,

and your user needs and expectations change

Make sure it is easy to update your taxonomy and

make it available to your systems (tagging, web

applications), ideally in real time

Taxonomies should always be

considered a work in progress!

20

Page 21: Dynamic Potential of Semantic Enrichment

Application of taxonomy to content—semantic tagging

Semantic tagging is the insertion of semantic information at the level of XML elements

Example: <root-term termID="47521">t cells, regulatory</root-term>

Tagging can be embedded directly in XML, provided as separate reference files, or placed in database tables that reference elements

If the content is inaccessible (e.g., images and videos, PDFs) tagging can be placed in header files

21

Page 22: Dynamic Potential of Semantic Enrichment

Who/what tags? Automated tagging—software analyzes content, adds tags

based on concept matching, patterns, grammar Pros: Highly scalable, good at finding trends in large bodies of content. Sometimes the

only option for very large data sets Cons: False positives, missed concepts

Manual tagging—humans with appropriate expertise (sometimes called Subject Matter Experts, or SMEs) read the content and apply tags

Pros: Precise, ideal when clinical judgment is required Cons: Cost-prohibitive for large volumes of content, hard to scale, inconsistent

(humans make subjective choices!)

Hybrid—automated process followed by manual review/modification For high-value, specialized sites (such as clinical decision support that require “one best

answer” results) this extra human touch can be necessary Some content types aren’t accessible to automated systems (multimedia)

22

Page 23: Dynamic Potential of Semantic Enrichment

<collection1, collection2> <summary>

Disease <summary>

Diagnosis Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum dignissim, <odio purus>, in enim phasellus eget, tincidunt suspendisse tempus. <Egestas tempor> eu id velit rutrum, per diam arcu eget nec placerat.

<summary>

<summary>

Subheading. <Pretium consequat> luctus nascetur. Interdum

et quis malesuada pellentesque. Lorem nonummy <massa tristique> augue viverra., ridiculus eleifend at.

<summary>

<summary>

Treatment <Tincidunt> suspendisse amet, cras sagittis velit velit fermentum dignissim, odio purus, in enim phasellus eget, <tincidunt suspendisse tempus>. Egestas tempor eu id <lorem ipsum dolor> sit amet.

References 1. Lorem ipsum dolor sit amet, cras sagittis velit velit 2. Lorem ipsum dolor sit amet, cras sagittis velit velit fermentum

TABLE. Rewrewqrq <rewqrewreq dsfdsafsda>

fdsfsdafdsfds fdsfdsfdsafds fdsfdsfdsfds

rewrewrq rewqrwq rewrwq

Tagging for different uses

FIGURE. <Tincidunt suspendisse> tempus cras.

<Collections> What “buckets” does this content object belong in?

Assignment of content into topical collections for major site navigation or product definition

topic collections; microsites; virtual journals…

<Section Summaries> What is this section/article/chapter about?

Most significant topics discussed at the article/chapter/ section (wrapper) level

answers to clinical questions; review; skills assessment…

<Entities> What is this thing?

Important concepts at the paragraph/list/ table/figure (granular) level

complex search queries; concept overlap analysis; specific entity types like drugs, genes, clinical trials, manufacturers… 23

Page 24: Dynamic Potential of Semantic Enrichment

WHY

should you care

(What are the benefits?)

24

Page 25: Dynamic Potential of Semantic Enrichment

Failure of the status quo

Information scarcity is no longer the issue. Attention scarcity is the problem.

The publisher’s role in information curation and filtering has never been more important. However, the tools to achieve them are changing.

“Information is a source of learning. But unless it is organized, processed, and available to the right people in a format for decision making, it is a burden, not a benefit.”– William Pollard, Physicist

25

Page 26: Dynamic Potential of Semantic Enrichment

Search accuracy, precision

Faster, more accurate and reliable answers to questions enhance user productivity and thus improve your application’s usability and user satisfaction ratings.

The accuracy threshold for STM information is very high! Users increasingly will not tolerate ambiguous results.

Time-strapped users are struggling with information overload—fewer, better answers are often preferred.

Tagging allows exposure of hard-to-find media like images, videos.

26

Page 27: Dynamic Potential of Semantic Enrichment

“Which did you mean?” at McGraw-Hill’s AccessMedicine

27

Page 28: Dynamic Potential of Semantic Enrichment

28

Page 29: Dynamic Potential of Semantic Enrichment

Pathways to related content

Related search terms

Links to related content within and across resources

Dynamically generated as new content is added

Goal: Increases serendipitous discovery, site stickiness, and usage metrics like number of page views and time on site

29

Page 30: Dynamic Potential of Semantic Enrichment

30

Page 31: Dynamic Potential of Semantic Enrichment

31

Page 32: Dynamic Potential of Semantic Enrichment

Contextual integrations

Internally—across titles and content types (journals, books, videos, images, e-learning…)

Externally—with partners and external data sets

Increasingly important to integrate content into customer workflows—to bring content to them in context as they do their daily work clinicians at point of care students as prepare for exam

32

Page 33: Dynamic Potential of Semantic Enrichment

New products

Content recycling: Create new products from content you already have Image collections

Mashup and micro products that serve specialized audiences and fit specific workflows

Topically constructed objects like virtual journals, knowledge environments, coursepacks, learning objects

You can cost-effectively create

niche products not possible before

33

Page 34: Dynamic Potential of Semantic Enrichment

AIP/APS virtual journals

34

Page 35: Dynamic Potential of Semantic Enrichment

Search engine optimization

Granular topic exposure leads to better ranking in major search engines

Next wave of discovery tools (intelligent agents, virtual research assistants) will give greater weight to content they can understand

Tags can also be exposed to help create auto-extracts for content that doesn’t have abstracts (like book chapters)

35

Page 36: Dynamic Potential of Semantic Enrichment

36

Page 37: Dynamic Potential of Semantic Enrichment

Semantic users As users search and navigate semantic content, you can attach the

tags on that content to them

A semantic profile for a user can be created from his/her site activity

What topics are they interested in? How are their interests evolving?

Use this information to create personalized information services

Excellent method for encouraging anonymous institutional users to register/log in

Use topical affinities between users to create communities of practice—groups of people who share a passion for something they do and learn how to do it better through social interaction

37

Page 38: Dynamic Potential of Semantic Enrichment

Contextual advertising

Match article and ad semantic tags to precisely target ads based on topic

OR, block ads from appearing next to articles on related topics

OR (even better): Alternative advertising method Advertising can be targeted to the user profile, not just the article

Avoid targeting editorially sensitive pages but still deliver ads that match that user’s interests on neutral pages or alerts

For classified/job ad targeting, user interests can be matched up with demographics like location

38

Page 39: Dynamic Potential of Semantic Enrichment

What about mobile?

Reduction in number of clicks!

Precision in search

Quick links to what most users need

Targeted navigation that leads to content most important (answers to clinical questions)

39

Page 40: Dynamic Potential of Semantic Enrichment

HOW

to get started

40

Page 41: Dynamic Potential of Semantic Enrichment

Questions for you and your application/hosting providers

What are your user stories/use cases?

What are the business benefits/ROI for your organization?

What content do you need to tag, how is that content delivered, and can those delivery systems/platforms use taxonomy and tagging in a way that supports your user needs?

What’s your plan for keeping your taxonomy up to date?

Can your “living” taxonomy be integrated into your applications? In real time as you make updates?

41

Page 42: Dynamic Potential of Semantic Enrichment

Questions for semantic tech providers

Does the technology support your user stories/ use cases?

Does it offer/integrate with a constantly evolving taxonomy?

Does it meet the accuracy threshold for your users and your content?

Can it tag at the depth—both granular and summary level—necessary? Figures and tables? Top-level collections?

42

Page 43: Dynamic Potential of Semantic Enrichment

The semantic user story

I am specifically identifying --------------

because -------------------- is very important

to my ------------------- users

when they are ------------------ -.

43

Page 44: Dynamic Potential of Semantic Enrichment

The semantic user story

I am specifically identifying concise disease

treatment content because immediate access to

treatment options is very important to my

emergency physician users when they are seeing

20 patients an hour.

44

Page 45: Dynamic Potential of Semantic Enrichment

McGraw-Hill: metadata targeted to deliver fast, concise treatment info to ER doc

45

Page 46: Dynamic Potential of Semantic Enrichment

The semantic user story

I am specifically identifying skin disorder images

on all body locations and all types of skin because

visual diagnosis is very important to my family

physician users.

46

Page 47: Dynamic Potential of Semantic Enrichment

Derm101: images displayed in diagnosis search results

47

Page 48: Dynamic Potential of Semantic Enrichment

What are your user stories?

Problems/needs to solve for your users Delivering top quality care under serious time constraints

Explosion of new research to keep up with and integrate into practice

Need to pass a licensing exam

Problems/needs to solve for your organization Creating new products that grow and diversify revenue

Creating more value from advertising

Gaining insight into users

48

Page 49: Dynamic Potential of Semantic Enrichment

Thank you!

Pam Harley

VP, Product & Market Development SemedicaTM A DIVISION OF SILVERCHAIR

[email protected] (434) 296-6333 x372 www.silverchair.com www.semedica.com

“Organizing is what you do before you do something, so that when you do it, it is not all mixed up.”

–A. A. Milne

49