5-14-13 An Introduction to VIVO Presentation Slides

Post on 01-Nov-2014

1.739 views 1 download

Tags:

description

“Hot Topics: The DuraSpace Community Webinar Series, "Series Five: VIVO: Research Discovery and Networking.” Webinar #1: An Introduction to VIVO, May 14, 2013 Presented by: Dean Krafft, Chief Technology Strategist at Cornell University Library and Chair of the VIVO-DuraSpace Management Committee, Brian Lowe, Semantic Applications Programmer, Cornell and Jon Corson-Rikert, VIVO Development Lead, Cornell

Transcript of 5-14-13 An Introduction to VIVO Presentation Slides

May 14, 2013 Hot Topics: DuraSpace Community Webinar Series

Hot Topics: The DuraSpace Community Webinar Series

Series Five:

“VIVO: Research Discovery & Networking ”

Curated by Dean Krafft

May 14, 2013 Hot Topics: DuraSpace Community Webinar Series

Webinar 1: Overview of VIVO

Presented by:

Brian Lowe, Semantic Applications Programmer, Cornell

Jon Corson-Rikert, VIVO Development Lead, Cornell

Dean Krafft, Chief Technology Strategist at Cornell University Library and Chair of the VIVO-DuraSpace Management Committee

What is VIVO?

• A semantic-web-based researcher and research discovery tool– People plus much more

• Institution-wide, publicly-visible information– For external as well as internal audiences

• An open, shared platform for connecting scholars, communities, campuses, and countries using Linked Open Data

How did we get here?

31 authors

6 institutions

A brief VIVO history

2003-2005 First realization for the life sciences at Cornell, as a relational database

2006-2008 Expansion to all disciplines at Cornell, and conversion to Semantic Web

2009-2012 National Institutes of Health-sponsored VIVO: Enabling the National Networking of Scientists project transforms VIVO to a multi-institutional open source platform

2013-2014 VIVO Incubator Project with DuraSpace for open community development

Major opportunity, 2009

NIH … “invites applications designed to develop, enhance, or extend infrastructure for connecting people and resources to facilitate national discovery of individuals and of scientific resources by scientists and students to encourage interdisciplinary collaboration and scientific exchange.”

National partnership

2009

VIVO CollaborationCornell UniversityDean Krafft (Cornell PI)

Manolo BeviaJim Blake

Nick CappadonaBrian Caruso

Jon Corson-RikertElly Cramer

Medha DevareElizabeth Hines

Huda KhanDepak Konidena

Brian LoweJoseph McEnerneyHolly Mistlebauer

Stella MitchellAnup Sawant

Christopher WestlingTim Worrall

Rebecca Younes

University of FloridaMike Conlon (VIVO and UF PI)

Beth AutenMichael Barbieri

Chris BarnesKaitlin Blackburn

Cecilia BoteroKerry Britt

Erin BrooksAmy Buhler

Ellie BushhousenLinda Butson

Chris CaseChristine Cogar

Valrie DavisMary Edwards

Nita FerreeRolando Garcia-Milan

George HackChris HainesSara HenningRae Jesano

Margeaux JohnsonMeghan Latorre

Yang LiJennifer LyonPaula Markes

Hannah NortonJames Pence

Narayan RaumNicholas Rejack

Alexander RockwellSara Russell Gonzalez

Nancy SchaeferDale SchepplerNicholas SkaggsMatthew Tedder

Michele R. TennantAlicia Turner

Stephen Williams

Indiana UniversityKaty Borner (IU PI)

Kavitha ChandrasekarBin Chen

Shanshan ChenRyan CobineJeni Coffey

Suresh DeivasigamaniYing Ding

Russell DuhonJon Dunn

Poornima GopinathJulie Hardesty

Brian KeeseNamrata Lele

Micah LinnemeierNianli Ma

Robert H. McDonaldAsik Pradhan Gongaju

Mark PriceMichael Stamper

Yuyin SunChintan TankAlan Walsh

Brian WheelerFeng Wu

Angela Zoss

Ponce School of MedicineRichard J. Noel, Jr. (Ponce PI)

Ricardo Espada ColonDamaris Torres Cruz

Michael Vega Negrón

This project is funded by the National Institutes of Health, U24 RR029822"VIVO: Enabling National Networking of Scientists”

The Scripps Research Institute

Gerald Joyce (Scripps PI)Catherine Dunn

Sam KatkovBrant KelleyPaula King

Angela MurrellBarbara NobleCary Thomas

Michaeleen Trimarchi

Washington University School of Medicine in St. Louis

Rakesh Nagarajan (WUSTL PI)Kristi L. HolmesCaerie HouchinsGeorge JosephSunita B. Koul

Leslie D. McIntosh

Weill Cornell Medical CollegeCurtis Cole (Weill PI)

Paul AlbertVictor Brodsky

Mark BronnimannAdam Cheriff

Oscar CruzDan Dickinson

Richard HuChris Huang

Itay KlazKenneth Lee

Peter MicheliniGrace Migliorisi

John RuffingJason Specland

Tru TranVinay Varughese

Virgil Wong

What does VIVO do?

• Integrates multiple sources of data– Systems of record– Faculty activity reporting– External sources (e.g., Scopus, PubMed,

NIH RePORTER)• Provides a review and editing interface

– Single sign-on for self-editing or by proxy

• Provides integrated, filterable feeds to other websites

People

People and what they do

Structured data for visualizations

Enabling an (inter)national network

• Open software

• Open data

• Local control

• Decentralized infrastructure

What does VIVO model?

• People and more– Organizations, grants, programs, projects,

publications, events, facilities, and research resources

• Relationships among the above– Meaningful– Bidirectional– Navigable context

• Links to URIs elsewhere– Concepts, identifiers– People, places, organizations, events

Typical data sources

• HR – people, appointments

• Research administration – grants & contracts• Registrar – courses• Faculty reporting system(s)

– publications, service, research areas, awards• Events calendar• Internal and external news • External repositories – e.g., Pubmed, Scopus

Value for institutions

• Common data substrate– Public, granular and direct– Discovery via external and internal search

engines– Available for reuse at many levels

• Distributed curation– E.g., affiliations beyond what HR system tracks– Data coordination across functional silos– Feeding changes back to systems of record– Direct linking across campuses

• Data that is visible gets fixed

The Semantic Web

• Turn data into a web of simple links

• Use ontology to explain how things are linked

• Use reasoning to add new links automatically

• Be flexible and extensible

The VIVO ontology

• Describe people and organizations in the process of doing research

• Stay discipline neutral

• Use existing scientific domain terminology to describe content of research

What is Linked Open Data (LOD)?

• Data– Structured information, not just documents

with text– A common, simple format

• Open– Available, visible, mine-able– Anyone can post, consume, and reuse

• Linked– Directly by reference– Indirectly through common references and

inference

Linked Open Data

Linked data indexed for search

Ponce VIVOPonce VIVO

WashU VIVOWashU VIVO

IU VIVO

IU VIVO

Cornell

Ithaca VIVO

Cornell

Ithaca VIVO

WeillCornel

l VIVO

WeillCornel

l VIVO

eagle-iresearchresources

eagle-iresearchresources Harvard

ProfilesRDF

HarvardProfiles

RDF

OtherVIVOsOtherVIVOs

DigitalVitaRDF

DigitalVitaRDF

IowaLokiRDF

IowaLokiRDF

Linked Open DataLinked Open Data

vivosearch

.org

UF VIVOUF

VIVO

Scripps VIVO

Scripps VIVO

Solrsearchindex

Solrsearchindex

anotherSolr

index

anotherSolr

index

Implementation challenges

• A simple idea – take the basic public information about researchers at Cornell and make it easy to find for academic purposes

• Why is this hard?

Policy issues

• Dirty data

• Lack even of common definitions of organizations or who’s faculty

• Data ownership

• Many dimensions of privacy

• Short-term “go it alone” vs. common good

Enter data once, use it many times

Weill Cornell research reporting

• How has the number of publications co-authored with other institutions changed year to year?

Multi-institutional scenarios for VIVO

• Multiple campuses of one university• University and federal lab connections

– E.g., Colorado ties with regional federal labs

• Consortia – 60 CTSAs• International

– 13 Netherlands universities and the National Library

– AgriVIVO

Benefits across institutions

• Sharing experience provides clarity and new ideas

• Incentives from sharing development, tools, customizations

• Potential data-level connectivity

– Research is happening increasingly in teams that span institutions

– Meeting the needs of short and long-term virtual organizations

From outputs to outcomes

• Outputs like papers and patents can be tracked

– Collaborative ontology effort to adequately represent the humanities

• Outcomes such as economic impact or societal benefit are much harder to identify

• Questions about return on research investment beg for consistent, comparable data

– over time

– across institutions

– across domains

International engagement

International engagement

Partnerships – ORCID

• Open Researcher and Contributor ID– Attribution for works of any type

• ORCID and VIVO

– ORCID is an attribute in a VIVO profile– Tools being tested for submission of

researcher registrations from VIVO

http://orcid.org

VIVO/DuraSpace Partnership

• DuraSpace is a not-for-profit organization supporting the DSpace and Fedora repositories

• Serves as the open source community home for future VIVO development

• Provides a legal and financial framework, extensive tools, and proven track record of managing community developed open source projects

• Joint two-year initial governance based on founding sponsors, management team, and dedicated development and leadership effort

The VIVO Community

Meeting about VIVO

• 2nd Australian VIVO Days in February• CU Boulder hosted 50 attendees for the 3rd

VIVO Implementation Fest in April• May 20th VIVO event for New York City area

institutions• August 2013 will be the 4th Annual VIVO

Conference – approximately 200-250 attendees, with workshops, papers, keynotes, invited talks, and posters

Research Informatics Infrastructure

• USDA adopting for intramural research, and also using VIVO to knit together data from their 7 major agencies to fulfill reporting mandates to Office of Science & Technology Policy and Congress

• National Center for Atmospheric Research (NCAR) is piloting VIVO to coordinate large, multi-year, multi-institutional, multi-instrument research projects

Research Informatics Infrastructure – cont.

• Accurate, structured VIVO data can feed external profiling and discovery systems (ORCID, Google Scholar, Academic Analytics, etc.)

• VIVO extensibility allows it to represent research resources and tie them to research datasets, publications, and researchers, promoting data discovery and reuse

VIVO for atmospheric and space physics

CTSAconnect and the ISF

• VIVO and eagle-i team members won NIH funding in 2012 for a project to unify their ontologies and extend both in the clinical domain

• The unified ontology is known as the Integrated Semantic Framework, or ISF

• VIVO 1.6 and eagle-i’s next release will use the ISF

• This combined ontology is modular to allow selective data population based on local needs

Tying biomedical research to clinical delivery

Challenges

• Communicating VIVO’s goals to faculty, administrators, funders, and other institutions

• Adapting to constant changes in data sources• Fully exploiting the opportunities provided by

VIVO linked open data• Co-existing in a world where not everyone

uses VIVO• Positioning VIVO on a sustainable path

Next Webinar: Case Studies

• Tuesday, June 4• Colorado• Duke• Brown• Weill Cornell Medical College

3rd Webinar – Technical Deep Dive

• Tuesday, June 11• Ontology & Linked Data• Open source technologies used• What’s coming in v1.6• VIVO technical community touch points

• Many ways to participate, benefit, and contribute

May 14, 2013 Hot Topics: DuraSpace Community Webinar Series

Questions?