"What does 'Full Life-Cycle' Data Management Mean ?"

42
What does “Full Life- Cycle” Data Management Mean ? “BIG DATA” US Office of Personnel Management March 14, 2013

description

Presentation made to US Office of Personnel Management Community of Practice on Big Data

Transcript of "What does 'Full Life-Cycle' Data Management Mean ?"

Page 1: "What does 'Full Life-Cycle' Data Management Mean ?"

What does “Full Life-Cycle” Data Management Mean ?

“BIG DATA”US Office of Personnel Management

March 14, 2013

Page 2: "What does 'Full Life-Cycle' Data Management Mean ?"

“As required by the National Archives and Records Administration (NARA) in 36 CFR

Chapter XII, Subchapter B, Records Management, Federal agencies are

responsible for creating and maintaining authentic, reliable, and usable records and

ensure that they remain so for the length of their authorized retention period.”

http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf

Page 3: "What does 'Full Life-Cycle' Data Management Mean ?"

First, a brief digression concerning graphics…

Edward Tufte’s favorite…

Page 4: "What does 'Full Life-Cycle' Data Management Mean ?"

DISCRETION…

Exercise care in the selection of graphic formats – not all graphics enhance understanding some may confuse…

Lacking effective compound graphics, simplicity and the use of multiple graphic images may be more effective.

The New York Times often produces exemplary graphics that compress complex data and complex relationships…

Page 5: "What does 'Full Life-Cycle' Data Management Mean ?"

NYT: “LEADING CAUSES OF CANCER DEATHS”

http://www.nytimes.com/imagepages/2007/07/29/health/29cancer.graph.web.html

Page 6: "What does 'Full Life-Cycle' Data Management Mean ?"

“Data” ? [technical definition]

“…’data’ are defined as any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor streams, video, audio, algorithms, software, models and simulations, images, etc.” -- Program Solicitation 07-601 “Sustainable Digital Data Preservation and Access Network Partners (DataNet)”

Taken in this broadest possible sense, “data” are thus simply electronic coded forms of information. And virtually anything can be represented as “data” so long as it is electronically

machine-readable.

Page 7: "What does 'Full Life-Cycle' Data Management Mean ?"

“Data” [epistemic definition – addressing the meaning of data]

“Measurements, observations or descriptions of a referent -- such as an individual, an event, a specimen in a collection or an excavated/surveyed object -- created or collected through human interpretation (whether directly “by hand” or through the use of technologies)”

-- AnthroDPA Working Group on Metadata (May, 2009)[funded by Wenner-Gren Foundation and US NSF]

Page 8: "What does 'Full Life-Cycle' Data Management Mean ?"

“Experiments to determine the density of the earth,” by Henry Cavendish, ESQ., F.R.S. AND A.S. Read June 21, 1798 (From the Philosophical Transactions of the Royal Society of London for the year

1798, Part II. , pp. 469-526)

From: http://www.archive.org/details/lawsofgravitatio00mackrich

Page 9: "What does 'Full Life-Cycle' Data Management Mean ?"

USDA – NATURAL RESOURCES CONSERVATION SERVICE

Page 10: "What does 'Full Life-Cycle' Data Management Mean ?"

2 12.365 1196796112 2018.8 0.5585 0.51029 0.55517 0.54354 0.6067 0.52858 0.55351 0.59008 0.59506 0.60337 0.56514 12/4/07 11:21 4.47351 3 12.348 1196796232 2017.9 0.55682 0.51028 0.5535 0.54352 0.60669 0.52857 0.55017 0.59007 0.59505 0.60336 0.56513 12/4/07 11:23 0 4.47490 4 12.357 1196796352 2018.6 0.55514 0.51027 0.55348 0.54351 0.60501 0.52855 0.55016 0.59005 0.59504 0.60501 0.56512 12/4/07 11:25 0 4.47628 5 12.354 1196796472 2017.6 0.55514 0.51026 0.55181 0.5435 0.60334 0.52855 0.54849 0.59004 0.59503 0.60334 0.56511 12/4/07 11:27 0 4.47767 6 12.334 1196796592 2018.3 0.55347 0.51026 0.55015 0.5435 0.60333 0.52854 0.54682 0.59004 0.59502 0.605 0.56511 12/4/07 11:29 0 4.47906 7 12.34 1196796712 2018.5 0.55014 0.50859 0.55014 0.54349 0.60332 0.53019 0.54349 0.59003 0.59501 0.60498 0.56676 12/4/07 11:31 0 4.48045 8 12.337 1196796832 2017.8 0.55013 0.50692 0.55013 0.54348 0.60332 0.53019 0.54182 0.59002 0.59501 0.60498 0.56675 12/4/07 11:33 0 4.48184 9 12.328 1196796952 2017.5 0.5468 0.50691 0.5468 0.54347 0.60331 0.53018 0.53849 0.59001 0.595 0.60497 0.56674 12/4/07 11:35 0 4.48323 10 12.323 1196797072 2017 0.54679 0.50524 0.54679 0.54347 0.59998 0.53017 0.53682 0.59 0.59499 0.60496 0.56674 12/4/07 11:37 0 4.48462 11 12.328 1196797192 2018.9 0.54679 0.50191 0.54512 0.5418 0.59665 0.53017 0.53349 0.59 0.59498 0.60496 0.56673 12/4/07 11:39 0 4.48601 12 12.319 1196797312 2017.7 0.54345 0.49857 0.54178 0.54178 0.59663 0.53015 0.53015 0.58998 0.5933 0.60327 0.56671 12/4/07 11:41 0 4.48740 13 12.311 1196797432 2017.3 0.54343 0.4969 0.54011 0.54177 0.59661 0.53014 0.52848 0.58997 0.59329 0.6016 0.5667 12/4/07 11:43 0 4.48878 14 12.316 1196797552 2018.6 0.5401 0.49357 0.53678 0.54176 0.59328 0.53013 0.5268 0.58995 0.59328 0.60325 0.56669 12/4/07 11:45 0 4.49017 15 12.31 1196797672 2016.8 0.53844 0.4919 0.53511 0.54176 0.59494 0.53013 0.52514 0.58995 0.59328 0.60325 0.56503 12/4/07 11:47 0 4.49156 16 12.31 1196797792 2017.1 0.53676 0.48856 0.53343 0.54174 0.59326 0.53011 0.5218 0.58993 0.59326 0.60323 0.56501 12/4/07 11:49 0 4.49295 17 12.31 1196797912 2017.1 0.53342 0.48523 0.5301 0.54173 0.59324 0.5301 0.51846 0.58826 0.59324 0.60321 0.56499 12/4/07 11:51 0 4.49434 18 12.301 1196798031 2017.5 0.53174 0.48521 0.52842 0.53839 0.59156 0.53008 0.51845 0.58824 0.59323 0.6032 0.56498 12/4/07 11:53 0 4.49573 19 12.301 1196798151 2016.3 0.53007 0.48188 0.52509 0.53838 0.59155 0.53007 0.51512 0.58823 0.59321 0.60152 0.5633 12/4/07 11:55 0 4.49712 20 12.303 1196798271 2016.6 0.5284 0.47855 0.52175 0.53837 0.59154 0.5284

0.5151 0.58821 0.59154 0.60151 0.56163 12/4/07 11:57 0 4.49851

sbid battery datetime heater_voltage Manz1Sap1 Manz1Sap2 Manz1Sap3 Manz1Sap4 Manz2Sap5 Manz2Sap6 Manz2Sap7 Manz3Sap10 Manz3Sap8 Manz3Sap9 Manz4Sap11 timestamp Datagap Julian

manzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production.

Datum: “0.59998”

Page 11: "What does 'Full Life-Cycle' Data Management Mean ?"

DATASETS

someexamples

with “native metadata”

2-d_soil_temps.csvsurface, and sub-surface soil temperatures (at 2cm and 8cm depths) measured at one location for a few days in order to

calibrate a model of temperature propagation. Surface temperature was measured with an infrared thermometer, subsurface temperatures with a thermocouple.

----------------------------5-minute_light_data_for_4_continuous_days_plus_reference.xlsPPF (photosynthetic photon flux = photosynthetically active radiation 400-700nm) measured with an array of photodiodes

calibrated to a Licor sensor, along a linear transect for a few days. used to get an idea of how much light plants along the transect are receiving.

----------------------------CO2_of_air_at_different_heights_July_9.xlsconcentration of CO2 in the air during the evening for one day, measured with a Licor infrared gas analyzer and a series of

relays and tubes with a pump. used to examine the gradient of CO2 coming from the soil when the air is still during the evening.

----------------------------Fern_light_response.xlsLight response curves for bracken ferns, measured with a Licor photosynthesis system. Fronds are exposed to different light

levels and their instantaneous photosynthesis and conductance is measured. used in conjunction with the induction data (below) for physiological characterization of the ferns.

----------------------------La_Selva_species_photosyntheis_table.xlsincomplete data set on instantaneous photosynthesis rates for various tropical understory and epiphytic species grown in a

shade house in Costa Rica.----------------------------manzanita_sapflow_12-5-07_to_7-7-08.xlsinstantaneous sap flow data (as temperature differences on a constant temperature heat dissipation probe) for multiple

branches of Manzanita, collected with a datalogger. used to correlate physiological activity with below-ground measures of root grown and CO2 production.

----------------------------moisture_release_curves.xlspercentage of water content, water potential (in MegaPascals) and temperature of soil samples, measured in the laboratory for

calibration of water content with water potential. soil is from the James Reserve in California.----------------------------Photosynthetic_induction.xlsa time-course of photosynthetic induction for a leaf over 35 minutes. instantaneous photosynthesis measured as mol CO2 �

m/2/s and light level is probably 1000 micromoles. used to determine physiological characteristics of bracken ferns.----------------------------run_2_24-h_data_for_mesh.xlsmeasurements of micrometeorological parameters on a moving shuttle, going from a clearing across a forest edge and into the

forest for about 30 meters. Pyronometers facing up and down, pyrgeometer facing up and down, PAR, air temperature, relative humidity. Also data from a station fixed in the clearing and some derived variables calculated. used for examining edge effects in forests.

----------------------------Segment_of_wallflower_compare_colorspaces_blur.xlspixel counts from images of wallflowers that were segmented into flower/not-flower under different color spaces.

segmentation was made using a probability matrix of hand-segmented images. used to automatically count flowers in images collected after this training data was collected (and used to determine the best color space for this task).

Page 12: "What does 'Full Life-Cycle' Data Management Mean ?"

Data Development:“Data Reduction - Processing Level Definitions” (an example)

http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19860021622_1986021622.pdf Report of the EOS Data Panel Vol IIA, NASA, 1986 (Tech Memorandum 87777)

Tom Moritz, OPM “Big Data” July, 2012

Page 13: "What does 'Full Life-Cycle' Data Management Mean ?"

Data in Public ServiceThe Federal government manages data in

satisfaction of three primary requirements:1) To account transparently for government

operations2) To provide citizen access to the products of

government activities3) To fulfill mandated tasks for which the

government has no original data (this requires data acquisition)

Page 14: "What does 'Full Life-Cycle' Data Management Mean ?"
Page 15: "What does 'Full Life-Cycle' Data Management Mean ?"

The basic goal is to make all data held by the US government fully reliable and “audit-worthy”.

All data and all derived data products should be able to withstand exacting examination and testing.

All descriptive information required for auditing should be fully disclosed, readily available and easily

accessible in standard reporting formats.

Page 17: "What does 'Full Life-Cycle' Data Management Mean ?"

• AGS Alto Golfo Sustentable • ASM American Society of Mammalogists • CEC Commission for Environmental Cooperation • CEDO Intercultural Center for the Study of

Deserts and Oceans• CI Conservation International • CIRVA International Committee for the Recovery

of the Vaquita • CICESE Centro de Investigación Científica y

Ecuación Superior de Ensenada • CILA International Boundary and Water

Commission• CITES Convention on International Trade in

Endangered Species of Wild Fauna and Flora• Conagua National Water Commission• Conanp National Commission for Protected

Natural Areas, • Semarnat (Comisión Nacional de Áreas

Naturales Protegida—Semarnat) • Conapesca National Fisheries and Aquaculture

Commission• Sagarpa (Comisión Nacional de Pesca y

Acuacultura, Sagarpa)

• Profepa Federal Attorney for Environmental Protection

• Secretariat of Agriculture, Livestock, Rural Development, Fisheries, and Food (Mexico) Salud Secretariat of Health (Mexico)

• COSEWIC Committee on the Status of Endangered Wildlife in Canada

• Department of Fisheries and Oceans (Canada) • United States Department of the Interior • European Cetacean Society • US Environmental Protection Agency • US Food and Drug Administration• GEF Global Environmental • IBWC International Boundary and Water

Commission• National Institute of Ecology, Semarnat• Inapesca National Fisheries Institute, Sagarpa• IUCN World Conservation Union • International Whaling Commission• Local Economic and Employment Development

program • United States Marine Mammal Commission

VAQUITA STAKEHOLDERS

Page 18: "What does 'Full Life-Cycle' Data Management Mean ?"

• Marine Stewardship Council • NAMPAN North American Marine Protected

Areas Network (CEC) • US National Academy of Sciences • North American Wildlife Enforcement Group

(CEC) • US National Marine Fisheries Service, NOAA,

Department of Commerce • US National Oceanic and Atmospheric

Administration, Department of Commerce • United States National Ocean Service (NOAA) • PACE Species Conservation Action Programs,

Conanp• PGR Attorney General Office (Mexico)• POEMGC Marine Ecological Planning of the Gulf

of California Program, Semarnat• Procer Conservation Program for Species at Risk• Secretariat of Economy (Mexico) • Sectur Secretariat of Tourism (Mexico) • Sedesol Secretariat for Social Development

(Mexico) • Semar Secretariat of the Navy• Semarnat Secretariat of the Environment and

Natural Resources • Society for Marine Mammalogy • Solamac Latin American Society for Aquatic

Mammals

• Somemma Mexican Society for Marine Mammalogy

• SWFSC Southwest Fisheries Science Center( US NMFS, NOAA)

• The Nature Conservancy • Universidad Autónoma de Baja California Sur • University of California • United Nations • United States Coast Guard • United States Fish and Wildlife Service• World Wildlife Fund

Page 19: "What does 'Full Life-Cycle' Data Management Mean ?"

Values: “Data Quality” ???In the most general colloquial terms, “Data Quality” is the fundamental issue of

concern to scientists, policy makers, managers/decision makers and the general public.

“Quality” can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential

types of data that could be chosen should be weighed for probative value,,,)

• Competence (Reliability) : consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation

• Integrity: the maintenance of original integrity of data as well as tracking and documenting of all transformations and sequences of transformation of data

Page 20: "What does 'Full Life-Cycle' Data Management Mean ?"

Auditing – A Case History “InterAcademy Council Names IPCC Review Committee”

“AMSTERDAM, Netherlands – The InterAcademy Council (IAC), an organization of the world’s science academies, announced today that Harold T. Shapiro, an economist and former president of Princeton University and the University of Michigan, will chair a 12-member committee to conduct an independent review of the procedures and processes of the Intergovernmental Panel on Climate Change (IPCC). The review was requested in March by U.N. Secretary-General Ban Ki-moon and IPCC Chair Rajendra K. Pachauri.

“The committee will review IPCC procedures for preparing its assessment reports. Among the issues to be reviewed are data quality assurance and control; the type of literature that may be cited in IPCC reports; expert and government review of IPCC materials; handling of the full range of scientific views; and the correction of errors that are identified after a report has been completed. The committee also will review overall IPCC processes, including management functions and communication strategies (the full statement of task is available at www.interacademycouncil.net/ipccreview).”

http://reviewipcc.interacademycouncil.net/IACNamesIPCCReviewCommittee.html

Page 21: "What does 'Full Life-Cycle' Data Management Mean ?"

Climate Change Assessments:Review of the Processes and Procedures of the IPCC (InterAcademy Council)

U.N. Press Conference Aug. 30, 2010“Opening Statement” by Harold T. Shapiro

President Emeritus and Professor of Economics and Public Affairs, Princeton University and Chair, InterAcademy Council Committee to

Review the IPCC

http://reviewipcc.interacademycouncil.net/OpeningStatement.html

Page 22: "What does 'Full Life-Cycle' Data Management Mean ?"

US BLM Manual 1283 ”Data Administration and Management”

“Every employee is responsible for the quality, integrity, relevancy, accuracy, and currency of the data that is created, collected, or maintained, whether the data are in manual (paper copy) or electronic format. Managers will employ good data management practices to manage the data collected and maintained by their program specialists. The program specialist who uses, manages, and distributes the data must ensure that data are collected according to established standards and maintained to ensure accuracy and integrity. This section identifies specific responsibilities in support of the data management program.”

Rel. No. 1-1742 Supersedes Rel. No. 1-1678 Date: 7/10/2012

http://www.blm.gov/pgdata/etc/medialib/blm/wo/Information_Resources_Management/policy/blm_manual.Par.77674.File.dat/BLM_1283_manual_final.pdf

Page 23: "What does 'Full Life-Cycle' Data Management Mean ?"

A Gallery of Efforts to Depict Full Life Cycle Data

Management

Page 24: "What does 'Full Life-Cycle' Data Management Mean ?"

Source: DDI Structural Reform Group. “DDI Version 3.0 Conceptual Model." DDI Alliance. 2004. Accessed on 11 August 2008.

http://www.icpsr.umich.edu/DDI/committee-info/Concept-Model-WD.pdf

Page 25: "What does 'Full Life-Cycle' Data Management Mean ?"

US NSF “DataNet” Program“the full data preservation and access lifecycle”

• “acquisition” • “documentation”• “protection” • “access” • “analysis and dissemination” • “migration” • “disposition”

“Sustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation” NSF 07-601 US National Science Foundation Office of Cyberinfrastructure Directorate for Computer & Information

Science & Engineering

Page 26: "What does 'Full Life-Cycle' Data Management Mean ?"

www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf

Page 27: "What does 'Full Life-Cycle' Data Management Mean ?"

“JISC DCC Curation Lifecycle Model”

Tom Moritz, OPM “Big Data” July, 2012http://www.dcc.ac.uk/docs/publications/DCCLifecycle.pdf

Page 28: "What does 'Full Life-Cycle' Data Management Mean ?"

http://wiki.esipfed.org/images/c/c4/IWGDD.ppt

Interagency Working Group on Digital Data

Page 29: "What does 'Full Life-Cycle' Data Management Mean ?"

IWGDD “DIGITAL DATA LIFE CYCLE”Exhibit B-2. Life Cycle Functions for Digital Data*

• Plan−− Determine what data need to be created or collected to support a research agenda or a mission function

-- Identify and evaluate existing sources of needed data−− Identify standards for data and metadata format and quality−− Specify actions and responsibilities for managing the data over their life cycle

• Create−− Produce or acquire data for intended purposes−− Deposit data where they will be kept, managed and accessed for as long as needed to support their intended

purpose−− Produce derived products in support of intended purposes; e.g., data summaries, data aggregations, reports,

publications

• Keep−− Organize and store data to support intended purposes

-- Integrate updates and additions into existing collections-- Ensure the data survive intact for as long as needed

• Acquire and implement technology−− Refresh technology to overcome obsolescence and to improve performance−− Expand storage and processing capacity as needed−− Implement new technologies to support evolving needs for ingesting, processing, analysis, searching and accessing

data• Disposition−− Exit Strategy: plan for transferring data to another entity should the current repository no longer be able to keep it−− Once intended purposes are satisfied, determine whether to destroy data or transfer to another organization

suited to addressing other needs or opportunities

http://www.nitrd.gov/about/harnessing_power_web.pdf Tom Moritz, OPM “Big Data” July, 2012

Page 31: "What does 'Full Life-Cycle' Data Management Mean ?"

DataOne: The Data Life Cycle: An Overview

The data life cycle has eight components:Plan: description of the data that will be compiled, and how the data will be

managed and made accessible throughout its lifetimeCollect: observations are made either by hand or with sensors or other

instruments and the data are placed a into digital formAssure: the quality of the data are assured through checks and inspectionsDescribe: data are accurately and thoroughly described using the appropriate

metadata standardsPreserve: data are submitted to an appropriate long-term archive (i.e. data

center)Discover: potentially useful data are located and obtained, along with the

relevant information about the data (metadata)Integrate: data from disparate sources are combined to form one

homogeneous set of data that can be readily analyzedAnalyze: data are analyzedDataOne Best Practices Primer: http

://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf

Page 32: "What does 'Full Life-Cycle' Data Management Mean ?"

W. K. Michener “Meta-information concepts for ecological data management” Ecological Informatics 1 (2006) 3-7

Tom Moritz, OPM “Big Data” July, 2012http://tinyurl.com/d49f3vm

Page 33: "What does 'Full Life-Cycle' Data Management Mean ?"

Federal Geographic Data Committee

”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections 8(e)(d), 8(e)(f), and 8(e)(g)”

http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf

Page 34: "What does 'Full Life-Cycle' Data Management Mean ?"

“The Geospatial Data Lifecycle is not intended to be rigidly sequential or linear. The quality assurance and (or) quality control (QA/QC) functions for the data should be included at every stage of the Geospatial Data Lifecycle.”

[emphasis added]--”Stages of the Geospatial Data Lifecycle pursuant to OMB Circular A–16, sections

8(e)(d), 8(e)(f), and 8(e)(g)”

http://www.fgdc.gov/policyandplanning/a-16/stages-of-geospatial-data-lifecycle-a16.pdf

Page 35: "What does 'Full Life-Cycle' Data Management Mean ?"

Interagency Science Working Group National Archives and Records Administration

http://www.archives.gov/records-mgmt/toolkit/pdf/ID373.pdf

“Establishing Trustworthy Digital Repositories: A Discussion Guide Based on the ISO Open Archival Information System (OAIS) Standard Reference Model January 19, 2011”

Page 36: "What does 'Full Life-Cycle' Data Management Mean ?"

“Sustainable data curation”“There are several main elements necessary to sustain data curation:

“Robust data storage facilities (hardware and software) that are capable of accurately handling data migration across generations of media.

“Backup plans, that are tested, so irreplaceable data are not at risk. Unintended data loss can occur for many reasons: some major causes are: poor stewardship leading to the loss of metadata to understand where the data is located and documentation to understand the content, physical facility and equipment failure (fire, flood, irrecoverable hardware crashes), accidental data overwrite or deletion.

“Science-educated staff with knowledge to match the data discipline is important for checking data integrity, choosing archive organization, creating adequate metadata, consulting with users, and designing access systems that meet user expectations. Staff responsible for stewardship and curation must understand the digital data content and potential scientific uses. “

C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10.

www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]

Page 37: "What does 'Full Life-Cycle' Data Management Mean ?"

Sustainable data curation (cont.) “Non-proprietary data formats that will ensure data access capability for

many decades and will help avoid data losses resulting from software incompatibilities…

“Consistent staffing levels and people dedicated to best practices in archiving, access, and stewardship…

“National and International partnerships and interactions greatly aids in shared achievements for broad scale user benefits, e.g. reanalyses, TIGGE…

“Stable funding not focused on specific projects, but data management in general…”

C.A. Jacobs, S. J. Worley, “Data Curation in Climate and Weather: Transforming our ability to improve predictions through global knowledge sharing ,” from the 4th International Digital Curation Conference December 2008 , page 10-11.

www.dcc.ac.uk/events/dcc-2008/programme/papers/Data%20Curation%20in%20Climate%20and%20Weather.pdf [03 02 09]

Page 38: "What does 'Full Life-Cycle' Data Management Mean ?"

Database Lifecycle Management “The Database Lifecycle Management covers the entire

lifecycle of the databases, including:• Discovery and Inventory tracking: the ability to discover your

assets, and track them• Initial provisioning, the ability to rollout databases in minutes• Ongoing Change Management, End-to-end management of

patches , upgrades, schema and data changes• Configuration Management, track inventory, configuration

drift and detailed configuration search• Compliance Management, reporting and management of

industry and regulatory compliance standards• Site level Disaster Protection Automation”http://www.oracle.com/technetwork/oem/pdf/511949.pdf

Tom Moritz, OPM “Big Data”

Page 39: "What does 'Full Life-Cycle' Data Management Mean ?"

Design

DefineConceptualise

Plan

Produce

Create

Acquire

Receive

Collect

PreserveProtect

Curate

Maintain

Archive

AppraiseSelect

Analyze

Distribute

Access

UseReuse

Store

Discover

Dispose

Transform

Describe

Repurpose

Metadata standards Add

Metadata

Assure

Page 40: "What does 'Full Life-Cycle' Data Management Mean ?"

“Data Quality” ???“In the most general colloquial terms, ‘Data Quality’ is the fundamental issue of

concern to scientists, policy makers, managers/decision makers and the general public.

‘Data Quality’ can be considered in terms of three primary values: • Validity: logical in terms of intended hypothesis to be tested (all potential types

of data that could be chosen should be weighed for probative value,,,)

• Competence (Reliability) : consideration of the proper choice of expert staff, methods, apparatus/gear, calibration, deployment and operation

• Integrity: the maintenance of original integrity of data as well as tracking and documenting of all recording, migration, transformations and sequences of transformation of data”

Tom Moritz, OPM “Big Data” July, 2012

Page 41: "What does 'Full Life-Cycle' Data Management Mean ?"

“…the “validation” of any scientific hypotheses rests upon the sum integrity of all original data and

of all sequences of data transformation to which original data have been subject. “

– Tom Moritz“The Burden of Proof”

Tom Moritz, OPM “Big Data”

http://imsgbif.gbif.org/CMS_NEW/get_file.php?FILE=2b032cf8212d19a720f21465df0686

Page 42: "What does 'Full Life-Cycle' Data Management Mean ?"

Tom MoritzLos Angeles

[email protected] 963 0199

http://www.linkedin.com/in/tmoritz