Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively...

8
After a very successful first five years, Phase II of DataONE is now slated (and recommended by NSF) to begin August 1, 2014. In this new phase, DataONE will maintain and expand the current cyberinfrastructure (CI) and create new CI that will enable more global, open, and reproducible science. We will do so through four interrelated CI activities that are supported by the DataONE team of developers and the CI Working Group. First, we will significantly expand the volume and diversity of data available to researchers through the DataONE Federation of repositories (i.e., Member Nodes) for large- scale scientific innovation and discovery. DataONE will create lightweight and easily deployed “Slender Node” software and develop DataONE compatibility for common repository software systems (e.g. DSpace and others) that are already deployed in hundreds of high-value repositories worldwide. Second, we will incorporate innovative and high-value features into the DataONE CI. These new features include: 1) measurement search to leverage semantic technologies and enable highly precise data discovery and recall of data needed by researchers; 2) tracking the data through creation, all transformations, and analyses (provenance) to enable more reproducible science by storing and indexing provenance trace information that can be used to both reproduce scientific data processing Phase II of DataONE (2014-2019) Volume 2 Issue 4 ©2014 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106 and analysis steps and to discover specific data sources by examining the documented workflows; and 3) data extraction, sub- setting and processing services to enable researchers at any location to more easily participate in “big data” initiatives (e.g. working with data from large environmental observatories and participating in broad-scale synthesis and modeling endeavors). These three new sets of features will dramatically improve data discovery; further support reproducible and open science; and enable scientists from any institution, independent of networking capacity, to extract subsets of large data sets held in DataONE-affiliated repositories for processing and interpretation. Third, we will maintain and improve core CI software and services (e.g., Coordinating and Member Node software stacks and key components of the Investigator Toolkit) so that the user experience continues to improve, new services can be easily added over time, and the CI can be readily upgraded as operating system and other supporting software systems continue to evolve. Fourth, we will increase the number of Member Nodes (size of the Federation) while maintaining cybersecurity and trust. Both of these activities respond to the need for DataONE network continuity and reliability that are critical to maintaining community trust and enabling researchers to achieve their science objectives. Four working groups that are each comprised of experts from computer and information sciences, domain sciences, and cyber-enabled learning will guide and contribute to DataONE CI development and usability, sustainability, and education and outreach. The CI Working Group will coordinate core CI research and development, including the addition of new services such as provenance tracking and semantically enabled measurement search. The Usability and Assessment Working Group will help DataONE understand community needs and expectations, and constantly improve the CI via feedback from usability analysis. The Community Engagement and Outreach Working Group will ensure that community needs are met and that education activities and materials achieve optimal impact. The Sustainability and Governance Working Group will empower the community to drive the organization’s governance structure and sustainability strategies, ensuring that DataONE can sustain services and evolve to meet the needs of researchers, libraries, sponsors, and other stakeholders for decades to come. In addition to developing robust and powerful infrastructure, DataONE aims to change the scientific culture by promoting good data stewardship practices. Our specific goals are to: 1) build a community of Photo: Phase I DataONE Team at the 2014 All Hands Meeting in Park City, Utah. Credit: Grace Lerner.

Transcript of Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively...

Page 1: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

After a very successful first five years, Phase II of DataONE is now slated (and recommended by NSF) to begin August 1, 2014. In this new phase, DataONE will maintain and expand the current cyberinfrastructure (CI) and create new CI that will enable more global, open, and reproducible science. We will do so through four interrelated CI activities that are supported by the DataONE team of developers and the CI Working Group.

First, we will significantly expand the volume and diversity of data available to researchers through the DataONE Federation of repositories (i.e., Member Nodes) for large-scale scientific innovation and discovery. DataONE will create lightweight and easily deployed “Slender Node” software and develop DataONE compatibility for common repository software systems (e.g. DSpace and others) that are already deployed in hundreds of high-value repositories worldwide.

Second, we will incorporate innovative and high-value features into the DataONE CI. These new features include: 1) measurement search to leverage semantic technologies and enable highly precise data discovery and recall of data needed by researchers; 2) tracking the data through creation, all transformations, and analyses (provenance) to enable more reproducible science by storing and indexing provenance trace information that can be used to both reproduce scientific data processing

Phase II of DataONE (2014-2019)

Volume 2 Issue 4

©2014 DataONE 1312 Basehart SE University of New Mexico Albuquerque NM 87106

and analysis steps and to discover specific data sources by examining the documented workflows; and 3) data extraction, sub-setting and processing services to enable researchers at any location to more easily participate in “big data” initiatives (e.g. working with data from large environmental observatories and participating in broad-scale synthesis and modeling endeavors). These three new sets of features will dramatically improve data discovery; further support reproducible and open science; and enable scientists from any institution, independent of networking capacity, to extract subsets of large data sets held in DataONE-affiliated repositories for processing and interpretation.

Third, we will maintain and improve core CI software and services (e.g., Coordinating and Member Node software stacks and key components of the Investigator Toolkit) so that the user experience continues to

improve, new services can be easily added over time, and the CI can be readily upgraded as operating system and other supporting software systems continue to evolve.

Fourth, we will increase the number of Member Nodes (size of the Federation) while maintaining cybersecurity and trust. Both of these activities respond to the need for DataONE network continuity and reliability that are critical to maintaining community trust and enabling researchers

to achieve their science objectives. Four working groups that are each

comprised of experts from computer and information sciences, domain sciences, and cyber-enabled learning will guide and contribute to DataONE CI development and usability, sustainability, and education and outreach. The CI Working Group will coordinate core CI research and development, including the addition of new services such as provenance tracking and semantically enabled measurement search. The Usability and Assessment Working Group will help DataONE understand community needs and expectations, and constantly improve the CI via feedback from usability analysis. The Community Engagement and Outreach Working Group will ensure that community needs are met and that education activities and materials achieve optimal impact. The Sustainability and Governance Working

Group will empower the community to drive the organization’s governance structure and sustainability strategies, ensuring that DataONE can sustain services and evolve to meet the needs of researchers, libraries, sponsors, and other stakeholders for decades to come.

In addition to developing robust and powerful infrastructure, DataONE aims to change the scientific culture by promoting good data stewardship practices. Our specific goals are to: 1) build a community of

Photo: Phase I DataONE Team at the 2014 All Hands Meeting in Park City, Utah. Credit: Grace Lerner.

Page 2: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

2

CoverSTORY cont’dEach Member Node within the DataONE federation completes a description document summarizing the content, technical characteristics and policies of their resources. These documents can be found on the DataONE.org site at bit.ly/D1CMNs. In each newsletter issue we will highlight one of our current Member Nodes.

The Long Term Ecological Research (LTER) Network http://www.lternet.edu/

The largest and longest-lived ecological network in the United States, LTER provides the scientific expertise, research platforms, and long-term datasets necessary to document and analyze environmental change.

The Network brings together a multi-disciplinary group of more than 2000 scientists and graduate students. The 26 LTER sites encompass diverse ecosystems in the continental United States, Alaska, Antarctica and islands in the Caribbean and the Pacific—including deserts, estuaries, lakes, oceans, coral reefs, prairies, forests, alpine and Arctic tundra, urban areas, and production agriculture.

The LTER program was founded in 1980 with the recognition that long-term and broad-scale research is necessary for truly understanding environmental phenomena. The program was designed to provide the long-term data and information that is needed for informed decision making from a broad range of key ecosystems. The program is unique in three ways:

1. The research is located at specific sites chosen to represent major ecosystem types or natural biomes

2. It emphasizes the study of phenomena over long periods of time, based upon data collection in five core areas

3. Projects include significant integrative, cross-site, network-wide research

Research at LTER sites includes experiments, databases, and research programs for use by both Network and other scientists. Research provides opportunity to test important ecological or ecosystem theories including, but not limited to, ecosystem stability, biodiversity, community structure, and energy flow. Recognizing that the value of long-term data extends beyond use at any individual site, the LTER Network makes data collected by all LTER sites broadly accessible to all investigators through its Network Information System.

LTER’s DataONE Member Node further expands the discoverability and use of its multi-faceted data to a broad and diverse research community, while broadening DataONE’s coverage of environmental and ecosystem data. The LTER Network has been a founding partner with DataONE since 2009 and continues to support DataONE’s design and development efforts.

MemberNodeDESCRIPTION�

stakeholders through active engagement with data repositories and the broad community of scientists; and 2) educate scientists about good data life cycle practices through effective education, outreach and training activities and experiences. Community engagement in the biweekly Member Node Forum and the annual meeting of the DataONE Users Group will support expansion of the data content and services provided to and needed by the research community. A new DataONE webinar series and education resources (e.g., best practices and software tools, learning modules) will enable researchers to better steward their data and take advantage of the myriad services and tools available through DataONE. The DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating and providing web-based educational resources.

We are grateful for the confidence and support provided by NSF and look forward to another challenging and productive five years of CI development that will facilitate research innovation and good data stewardship. Special thanks and recognition also go to the project staff, numerous community volunteers and students, as well as the DataONE Users Group who are guiding and supporting the evolution of DataONE. n

— Bill Michener

Principal Investigator

Page 3: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

3

CyberSPOTCyberInfrastructure Update

The DataONE infrastructure continues production operations with an increasing number of Member Nodes and volume of data available through the DataONE service interfaces. New Member Nodes that have come online since the last newsletter include the Earth Data Analysis Center (EDAC) and the European Long Term Ecological Research site (LTER-Europe). This brings the total number of nodes to 21 (including three replication target nodes) providing access to almost 200,000 publicly readable, current version objects (84,000 data, 70,000 metadata, and 42,000 resource maps), and an overall total of more than 475,000 objects accessible through the DataONE federation.

The complete list of current Member Nodes includes:

• EDAC Gstore Repository• LTER Europe Member Node• ORNL DAAC• Knowledge Network for Biocomplexity• Cornell Lab of Ornithology - eBird• University of Kansas - Biodiversity

Institute• PISCO MN• SEAD Virtual Archive• LTER Network Member Node• Gulf of Alaska Data Portal• Dryad Digital Repository• Merritt Repository• ONEShare Repository• TFRI Data Catalog• USA National Phenology Network• USGS Core Sciences Clearinghouse• SANParks Data Repository• ESA Data Registry• DataONE ORC Dedicated Replica

Server• DataONE UCSB Dedicated Replica

Server• DataONE UNM Dedicated Replica

Server

Topic: Finding DataDataONE catalogues all content exposed

through participating Member Nodes and retrieves copies of metadata that are stored as replicas on the Coordinating Nodes. This

Figure 1. Elements of a simple data package. The package contains science data described by science metadata. The package itself is defined by a resource map document, and each data, science metadata, and resource map document has system metadata to describe the file characteristics of document.

collection of metadata is processed and indexed by the Coordinating Node to enable efficient search across the entire collection of metadata, and so effectively, across all the content exposed by the Member Nodes. Information is extracted from different elements of data packages. Science metadata documents (such as EML, FGDC, and ISO19115) provide information about the scientific purpose and use of the package, resource maps provide details of relationships between data and metadata within a package, and system metadata provides details such as the file size and type of the package elements.

An Investigator Tool such as ONEMercury uses the Coordinating Node content index exposed through the DataONE service interfaces to help users find data packages that match criteria specified in the search interface. Searches may be very general, matching the search term against the combined text extracted from the science metadata documents, or very specific matches of a value against a specific field such as the

name of a measurement field. In all cases, values stored in the search index match the query expression provided by the user.

Searching across a very heterogeneous collection such as represented by the DataONE federation requires that terms extracted from metadata are consistent. Such consistency applies to both the concepts (fields) being stored in the index as well as the values. For example, the starting date for data relevance is expressed in different fields in different metadata standards. In EML for example, one would use dataset/coverage/temporalCoverage/rangeOfDates/beginDate/calendarDate whereas in FGDC, the metadata/idinfo/timeperd/timeinfo/rngdates/begdate field would be used. The index processor in DataONE contains rules that map the concept of the beginning date from EML and FGDC metadata standards to a common concept. Furthermore, the representation of dates in EML is relatively consistent, whereas date representation in FGDC can be much more relaxed. During the processing of science

Figure 2. The index processing pipeline of DataONE which maps fields from different types of metadata to common concepts, and processes field values to a common representation.

Page 4: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

4

The Sustainability & Governance Working Group (S&G WG) is tasked with the hard stuff: developing a responsive and transparent governance structure; establishing policies and procedures that support DataONE’s mission, and; implementing a sustainable business model. These are all hard tasks, but will ultimately move DataONE from a largely NSF funded initiative to a sustainable network by 2020.

However, we won’t be able to get there alone and will rely on the input of the broader DataONE community for success. Over the next several years the S&G WG will focus on research, development, and governance implementation of the marketing plans, business plans, and sustainability strategy to establish DataONE as a long-term sustainable network for Earth, biological and environmental science data. Essentially, the S&G WG will tackle the age-old question of where the buck stops.

In order to tackle the hard stuff the S&G WG has actively engaged representatives from the DataONE community, including computer and information scientists, digital librarians and repository managers, domain scientists, business experts and consultants in formulating our initial governance, marketing and business strategies. During the next five years, Patricia Cruse and William Michener will work with a small team of individuals with similar diverse expertise to:

• Perform a comprehensive review of various sustainability models, including successful and failed models, related to environmental research networks and data preservation efforts.

• Evaluate potential governance models and recommend an appropriate governance strategy and model for the organization.

• Develop and recommend a DataONE sustainability strategy, including economic and technical sustainability, which includes a 5 & 10 Year Business Plan, a Marketing Plan, and a Development plan with respect to funding strategies and approaches.

• Coordinate and support External Advisory Board activities. n

— Bill Michener University of New Mexico

— Patricia Cruse California Digital Library

metadata, the DataONE index processor ensures that the representation of dates in the search index are consistent. This means that searches will have better precision and recall than simply indexing the literal information as expressed in the metadata. This process mapping and normalization process is applied during population of the search index as depicted in Figure 2.

This approach provides a foundation for both progressive refinement of the metadata processing rules - as more metadata becomes available through the DataONE federation, a greater understanding of the range of terms and values can be used to tune the index. It also allows for more advanced processing such as semantic term alignment to be achieved so that different representation of for example, measurement fields can be expressed as a common concept by mapping through appropriate vocabularies and ontologies. n

Figure 3: Counts of data/metadata/resource maps uploaded to DataONE since release in July 2012

Cartoon published with permission of I.B. (Bill) Nelson. Many thanks.

CyberSPOT cont’d

WorkingGroupFOCUS They say, “Technology is the easy part, it’s the cultural stuff that is hard”

Page 5: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

5

IssueHIGHLIGHT:�Data Carpentry WorkshopOn May 8 and 9, 2014, 4 instructors, 4

assistants, and 27 learners filed into the largest meeting space at the National Evolutionary Synthesis Center (NESCent) for the inaugural Data Carpentry bootcamp. Data Carpentry is modeled on Software Carpentry, but focuses on tools and practices for more productively managing and manipulating data. The inaugural group of learners for this bootcamp was very diverse. They included graduate students, postdocs, faculty and staff, from three of the largest local research universities (Duke University, University of North Carolina, and North Carolina State University). Over 55% of the attendees were women and research areas ranged from evolutionary biology and ecology to microbial ecology, fungal phylogenomics, marine biology, and environmental engineering. One participant was even a library scientist from Duke Library.

Acquiring data has become easier and less costly, including in many fields of biology. Hence, we expected that many researchers would be interested in Data Carpentry to help manage and analyze their increasing amounts of data. To get a better idea of the breadth of perspectives that learners brought to the course, we started by asking learners why they were attending. The responses reflected a broad spectrum of the daily data wrangling challenges researchers face:

• I'm tired of feeling out of my depth on computation and want to increase my

confidence.• I usually manage data in Excel and it's

terrible and I want to do it better.• I'm organizing GIS data and it's

becoming a nightmare.• This workshop sounds like a good way

to dive in head first.• My advisor insists that we store

50,000 barcodes in a spreadsheet, and something must be done about that.

• I want to teach a reproducible research class.

• I'm having a hard time analyzing microarray, SNP or multivariate data with Excel and Access.

• I want to use public data.• I work with faculty at undergrad

institutions and want to teach data practices, but I need to learn it myself first.

• I'm interested in going in to industry and companies are asking for data analysis experience.

• I'm trying to reboot my lab's workflow to manage data and analysis in a more sustainable way.

• I'm re-entering data over and over again by hand and know there's a better way.

• I have overwhelming amounts of NGS data.

The instructors discussed many of these kinds of scenarios during the months of planning that preceded the event. Therefore we were hopeful that the curriculum

elements we chose from the many potentially useful subjects would address what many of the learners were hoping to get out of the course. Here is what we finally decided to teach, and the lessons we learned from that as well as from the feedback we received from the learners.

We taught four different sections:1. Wrangling data in the shell (bash):

Differences between Excel documents and plain text; getting plain text out of Excel; navigating the bash shell; exploring, finding and subsetting data using cat, head, tail, cut, grep, find, sort, uniq (Karen Cranston)

2. Managing and analyzing data in R: navigating R studio; importing tabular data into dataframes; subsetting dataframes; basic statistics and plotting (Tracy Teal)

3. Managing, combining and subsetting data in SQL: database structure; importing CSV files into SQLite; querying the database; creating views; building complex queries (Ethan White)

4. Creating repeatable workflows using shell scripts: putting shell commands in a script; looping over blocks of commands; chaining together data synthesis in SQL with data analysis in R (Hilmar Lapp)

This was the first-ever bootcamp of this kind, so after it was all done, we had a lot of ideas for future improvements:

Photograph from the recent Data Carpentry Workshop held at NESCent, May 8th-9th. 4 instructors, 4 assistants and 27 attendees. Image courtesy of Deb Paul.cont’d page 6 ›››

Page 6: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

6

• The SQL section should come before the R section! It makes more sense in terms of workflow (extract subset of data; export to CSV for analyses) but is also an easier entry for learners (easier syntax, can see data in Firefox plugin). The learners seemed to get SQL: there were fewer red sticky notes and questions were more about transfer ("how would I structure this other query") than comprehension ("how do I correct bash / R syntax").

• Each section should include discussion about how to structure data and files to make one's life easier. Ethan did this for the SQL section, and it was very effective.

• Students were already motivated when they came to the bootcamp; they didn't need to be convinced that what we were teaching was important. Many people are already struggling with data, and are hungry for better tools and practices. Our bootcamp filled up in less than 24 hours after opening registration, and there was virtually no attrition despite zero tuition costs—everyone showed up, and every learner stayed until the end of day 2.

• What the best tool is for a particular job is still a big question. When would I use bash vs R vs SQL? Learners brought this up repeatedly, and we didn't always have good answers that didn't involve hand waving, perhaps in part because the answer depends so much on context and the problem at hand.

• +1 for using a real (published!) data set that was relevant to at least some of the participants; for using this same data set throughout the course; and for having an instructor with intimate knowledge of the data (could explain some of the quirks of the data). #squirrelcannon

• For the shell scripting section, an outline and/or concept map would have been useful to give learners a good idea upfront of what we were trying to accomplish. Without this, some learners (and helpers!) were confused about which endpoint we were working towards.

• People who fall behind need a good way to catch up. Ways to do this include providing a printed cheat sheet of commands at the start of the session; providing material

IssueHIGHLIGHTcont’donline (unlike the well polished Software Carpentry material, the material for Data Carpentry is still in the early stages of online documentation); and having one helper dedicated to entering commands in the Etherpad.

• There is great demand for this type of course. Even without charging a fee, we didn't have any empty seats the first day, and 100% of attendees returned for the second day. Also, there were 62 people on the wait list! And we know that many people didn't even sign up for the wait list, even though they were interested.

There were also various things we wanted to teach but that came under the chopping block due to lack of time and other reasons. One of these, and one that learners asked about repeatedly, was the subject of "getting data off the web". It will take more thought to pin down what that should actually mean as part of Data Carpentry bootcamp aimed at zero-barrier to entry. It might mean using

APIs to access data from NCBI or GBIF, but it's far from clear whether that would be meeting learners' needs or not. For most general-purpose data repositories, such as Dryad, most of their data are too messy to use without extensive cleanup.

All of the helpers including Darren Boss (iPlant), Matt Collins (iDigBio), Deb Paul (iDigBio), and Mike Smorul (SESYNC) did a great job of helping the students pick up new data skills. Finally, we'd like to thank our sponsors for their support, including NESCent for hosting the event and keeping us nourished, and the Data Observation Network for Earth (DataONE), without whom this event wouldn't have taken place. n

— Karen Cranston National Evolutionary Synthesis Center

(re-published from http://software-carpentry.org/blog/2014/05/our-first-data-carpentry-workshop.html

with permission)

Members of the DataONE Team will be at the following events. Full information on training activities can be found at bit.ly/D1Training and our calendar is available at bit.ly/D1Events.

Jul. 6-7 DataONE Users Group Frisco, CO http://www.dataone.org/dataone-users-group

Jul. 8-11 Federation of Earth Science Information Partners Frisco, CO http://commons.esipfed.org/2014SummerMeeting

Aug. 10-15 Ecological Society of America Annual Meeting Sacramento, CA http://esa.org/am/

Aug. 25-27 DataCite Annual Conference Nancy, France http://www.datacite.org/events

Sep. 22-24 Research Data Alliance Plenary Meeting Amsterdam, Netherlands https://www.rd-alliance.org/rda-fourth-plenary-meeting.html

Oct. 27-31 GLEON 16 Orford, QC, Canada http://www.gleon.org/meetings/gleon16/main

UpcomingEVENTS��

Page 7: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

7

TheDUGout

Hello DUG Members,The DataONE Users Group meets this week

to share information, networking opportunities and a beautiful locale.

This year’s meeting will be held at Copper Mountain Resort in Frisco, Colorado, immediately preceding and colocated with the ESIP annual meeting. There will be updates on Phase I activities and the plans for DataONE Phase II (see lead article). The format will be similar to last year’s meeting in that we will have breakout sessions, roundtable sessions, and a poster session. There will be breakout sessions to learn more about DataONE tools, education and outreach efforts, and a Member Node forum. We will also have roundtable discussions about open access, data documentation, and more. An addition from last year is that in response to feedback, we will be live streaming the plenary sessions. Links to these, and the agenda, can be found on the DUG page at http://www.dataone.org/dataone-users-group.

Another exciting change in this year’s program is that it will include a half-day long DMPTool workshop on July 7 from 1 - 5 pm. The new version of the DMPTool was just recently released. If you have not had a chance to see the new version, you will be impressed with its look and functionality. The partners have planned a workshop to introduce new and existing users to data management plan requirements and features of the new tool for specific user groups. The session will culminate with hands-on practice using the tool. And again, we will be streaming the first part of this session so please join us remotely.

This is an exciting time for DataONE as it moves into its second phase. As always, if you have any comments or questions, please direct them to us, the DUG chairs. We hope to see you next week. n

— Andrew SallansChair, DataONE Users GroupUniversity of Virginia Library

— Chris EakerVice-Chair, DataONE Users Group

University of Tennessee Library

FeaturedRESOURCE�

Member Node DashboardEnhanced exposure for DataONE Member Nodes provides users with real time information on current data holdings within the DataONE network of repositories. The new ‘Member Node Dashboard’ shows the volume and number of data and metadata files discoverable through the DataONE network, in addition to charting file uploads across time. A comprehensive table of active, upcoming and replication nodes provides users with current information on Member Node status (up, down, unknown), direct links to the repository homepage, a detailed Member Node description document and access to individual Member Node statistics pages to explore data metrics, including information on when the Member Node joined DataONE and when newest content was added.

“The new Member Node dashboard [creates] an informative and visually appealing space for Member Nodes” commented Inna Kouper of the SEAD Virtual Archive, a DataONE Member Node. “The chart provides an overview of content availability and the tables give a quick overview of nodes with links to their pages. It’s quite impressive, [..] the growth of data and nodes is obvious”.

This newly designed interface provides users a rapid snapshot of the content within DataONE and showcases the holdings of our Member Node partners. We look forward to growing the Member Node network and exploring ways to enhance discoverability of Member Node holdings.

For more information on partnering with DataONE as a Member Node see: http://www.dataone.org/member-node-deployment-process.

Page 8: Phase II of DataONE (2014-2019) · 2020. 7. 13. · DataONE Summer Internship Program will actively involve students in CI development and related DataONE activities such as creating

� Summer 2014

8

1312 Basehart SEUniversity of New MexicoAlbuquerque, NM 87106

Fax: 505.246.6007

DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under

a Cooperative Agreement.

Project Director:

William [email protected]

505.814.7601

Executive Director:

Rebecca [email protected]

505.382.0890

Director of Community Engagement and Outreach:

Amber [email protected]

505.205.7675

Director of Development and Operations

Dave [email protected]

OutreachUPDATE Summer is a busy time for Community

Engagement and Outreach activities at DataONE. The Summer Internship Program is in full swing, the DataONE Users Group meeting is held in July every year and we always have a team presence at the Ecological Society of America (ESA) meeting. And yet we still manage to enjoy a few leisurely sunny days with ice-cream. (That last part might just be me).

Our summer interns are in their fifth or sixth week of a nine week project (depending on start date) and have made significant progress. As in previous years, this sixth cohort of summer interns have been providing community updates on their activities via the DataONE notebooks site. With projects ranging from

ontology search, metadata standards and provenance trace to citizen science methods, data videos and screencasting there will be something of interest to everyone and I encourage you to take a look and provide feedback (https://notebooks.dataone.org/).

We are extremely grateful to the mentors who provide their time, expertise and support to the interns and are proud of the achievements accomplished over the last six years of the program. We are also pleased to be able to continue the DataONE Summer Internship Program into Phase II.

DataONE 2014 Summer Interns:

Kate ChastainData Annotation: Integrating User Management into the DataONE Metadata Environment

Yue LiuIntegrating Ontology Search and Recommendation into the DataONE Metadata Environment

Yurong HeTuning the Citizen Science “Instrument” for Gathering Data While Documenting Data Quality

Heejun KimTuning the Citizen Science “Instrument” for Gathering Data While Documenting Data Quality

James MichaelisProviding Provenance Trace in OPeNDAP Hyrax Served Science Data in a DataONE Member Node

Tinahong SongUnderstanding and Using Provenance from Digital Notebooks

Kate AlderteCommunity Sustainable Scientific Metadata Standards Directory

Yan GaoSimilarityExplorer: Inspire Climate Science Discovery Through Advanced Big Data Analysis

Becky BeamerCreating Engaging Video Shorts for Stories About Data Management and Sharing

Heather HeinzDeveloping Screencast Tutorials for DataONE Tools and Resources

In ESA news, we have a number of sessions planned and DataONE team members will be there throughout the meeting at our sessions, presenting in other sessions and at the DataONE booth (#115, adjacent to LTER). Stop by to take a look at some newly developed screencast materials, for a demo of DataONE search or to pick up some stickers and other goodies (yes, there will most likely be chocolate). We will also be covering DataONE search and screencasts during our Ignite sessions on tools, tips and techniques for working with ecological data. Ignite sessions are fun, fast-paced 5 minute presentations and so in one short hour you will be exposed to a suite of DataONE and partner tools and equipped with some preliminary skills that will take you through the Data Life Cycle. Links for these ESA sessions and other DataONE coordinated events are provided above but be sure to visit http://www.dataone.org/training-activities for updated ESA information, including other presentations by DataONE team members.

In the meantime, enjoy the summer and hopefully we’ll see you next week at the DUG meeting. If you can’t make it, stop by remotely - your input is always welcome. n

DataONE�at�the�ESA

IGN3�

Tools�for�Working�With�Ecological�Data

Tues Aug 12 0800AM - 0930AM

IGN4

Tips�and�Techniques�for�Working�with�Ecological�Data

Tues Aug 12 1000AM - 1130AM

SS17:�

Creating�Effective�Data�Management�Plans�for�Ecological�Research

Tues Aug 12 0800PM - 1000PM

IGN15:�

Science�Communication

Fri Aug 15 0800AM - 0930AM

Booth�#115

See http://www.dataone.org/training-activi-ties for updates.