Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF...

19
Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data CYNDY CHANDLER BIOLOGICAL AND CHEMICAL OCEANOGRAPHY DATA MANAGEMENT OFFICE WOODS HOLE OCEANOGRAPHIC INSTITUTION GeoData 2014 ~ 18 June 2014 ~ NCAR Center Green Campus, Boulder, Colorado

Transcript of Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF...

Page 1: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Technical Issues of Connecting GeoData within and Between

Governmental Agencies: Focus on NSF Research Data

CYNDY CHANDLERBIOLOGICAL AND CHEMICAL OCEANOGRAPHY DATA

MANAGEMENT OFFICEWOODS HOLE OCEANOGRAPHIC INSTITUTION

GeoData 2014 ~ 18 June 2014 ~ NCAR Center Green Campus, Boulder, Colorado

Page 2: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Scope: NSF GeoData• NSF funded, hypothesis-driven, ocean science

research projects from Division of Ocean Sciences (OCE)• OCE Biology and Chemistry

Division of Polar Programs (PLR)• Antarctic Research

ANT Antarctic Organisms and Ecosystems

Page 3: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Connectivity Challenges• Goals:

linking content at distributed repositories improved interoperability

• Technical strategies/solutions: metadata content standards controlled vocabularies Linked Data

• Not just technical cultural conditions, behaviors research data lifecycle “proposal to preservation”

Page 4: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

An example

• A researcher reads a paper We have already assumed they have found and

are able to retrieve the paperhttp://www.pnas.org/content/111/22/8089.fullPatrick Martin, Sonya T. Dyhrman, Michael W. Lomas, Nicole J. Poulton, and Benjamin A. S. Van Mooy (2014) “Accumulation and enhanced cycling of polyphosphate by Sargasso Sea plankton in response to low phosphorus” PNAS 2014 111 (22) 8089-8094; published ahead of print April 21, 2014, doi:10.1073/pnas.1321719111

Page 5: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Example (cont’d)

there is a data supplement

DOI

Page 6: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

What do I Know?

• Publication: PNAS, has a DOI, has data suppl.• Person name (author): Benjamin Van Mooy• Dates of activity: 2010 and 2012• Location keywords: Sargasso Sea• Cruise: on vessel Knorr • Data keywords: plankton, polyphosphate, lipid

general knowledge

domain specific

Page 7: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Research is a

game of Connect the Dots

• the dots are entities of information and data from distributed repositories

Page 8: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

• Some catalogs or repositories are already connected making it easier to “connect the dots”

Connect the Dots

Page 9: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Connect the Dots

• Some catalogs (repositories) are already connected making it easier to “connect the dots”

• Dot #3 is a piece of information held in common (e.g. cruise ID)

Page 10: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

• Some catalogs or repositories are already connected

Connect the Dots

Page 11: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

• Some catalogs or repositories are already connected

Connect the Dots

Page 12: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Connect the Dots

Persistent identifiers• for publications

(DOI)• for data (DOI)• for people (ORCID)

Page 13: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

• metadata• negotiated, shared,

common IDs• persistent IDs from

authoritative sources• controlled

vocabularies

local terms mapped to community-wide terms identified by URIs

Connect the Dots

Page 14: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Connect the Dots

• metadata• negotiated, shared,

common IDs• persistent IDs from

authoritative sources• controlled

vocabularies• semantic markup to

provide context and establish relationships

Page 15: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

context matters

Semantic Web technologies can help

Page 16: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Connect the Dots

• Technical strategies/solutions: metadata … more metadata standards-compliant metadata globally unique persistent identifiers from

authoritative sources controlled vocabularies (local & community-wide) semantic markup Linked Data*

• Support transition from human to machine clients

*Linked Data: Bizer, Heath, Berners-Lee, 2009; 10.4018/jswis.2009081901

Page 17: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Progress since 2011

What has made the difference? Program manager involvement• Consequences for PIs for not making data available • Long-term commitment (funding, active engagement)

Changing expectations from originators• Marine ecosystem research requires access to many

different kinds of data

Page 18: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Progress since 2011

What has made the difference?Community organizations• NSF EarthCube: funding to establish partnerships with

other data managers, computer scientists and geoscientists

• ESIP: opportunity to work with people from other communities doing similar work discussions focus on challenges, activities deliver results

• RDA: global organization to foster data sharing• International efforts with a domain focus (e.g. ocean)

Page 19: Technical Issues of Connecting GeoData within and Between Governmental Agencies: Focus on NSF Research Data C YNDY C HANDLER B IOLOGICAL AND C HEMICAL.

Modern data Semantic Webinfrastructure requires Technologies involve

inspired by (2013)