Data and Society Lecture 3: Data and Healthbermaf/Data Course 2017/L3 - 2017.pdf · 2017. 2. 3. ·...
Transcript of Data and Society Lecture 3: Data and Healthbermaf/Data Course 2017/L3 - 2017.pdf · 2017. 2. 3. ·...
Fran Berman, Data and Society, CSCI 4967/6963
Data and Society Lecture 3: Data and Health
2/3/17
PDB, NIH, Precision Medicine
Fran Berman, Data and Society, CSCI 4967/6963
Announcements 2/3
• No Wednesday class February 8
• Op-Ed draft due February 10 – instructions
in Lecture 1 and Syllabus
• Research paper instructions next week
(Friday, February 10). No discussion article.
Fran Berman, Data and Society, CSCI 4967/6963
Today (2/3/17)
• Lecture 3: Data and Health
– Data and Community: PDB
– Data and Government: NIH BD2K
– Data and the Individual: Precision Medicine
• Discussion
• Break
• 4 Student Presentations
3
Fran Berman, Data and Society, CSCI 4967/6963
Wednesday Section Friday lecture
First Half of Class Second Half of Class Assignments
January 18 : NO class January 20 L!: Class Intro + Logistics / Survey / Digital Data in the 21st Century
Presentation Model / Op-Ed Instructions
January 25: NO class January 27 L2: Big data applications / Data and the election; Data and Target; Discussion
4 Presentations
February 1: 6 presentations February 3
L3: Data and Health / PDB, Precision Medicine; Discussion
4 Presentations
February 8: NO class February 10 L4: Data and Science / Earthquakes, LHC; Paper Instructions
4 Presentations Op-Ed Draft Due
February 15: 6 presentations
February 17 L5: Data Cyberinfrastructure; Discussion
4 Presentations Op-Ed Draft Back
February 22: 6 presentations
February 24 L6: Data Stewardship and Data Preservation; Discussion
4 presentations Op-Ed Final Due
March 1: NO class March 3 NO class
March 8: 6 presentations March 10 L7: Data Futures – Internet of Things; Discussion
4 presentations Paper Draft Due
March 15: Spring Break March 17 Spring Break
March 22: NO class March 24 L8: Data rights and policy / U.S. and EU; Discussion
4 presentations
March 29: 6 presentations March 31 Op-Ed Pecha-Kucha Paper Draft Back
April 5: NO class April 7 NO class
April 12: 4 presentations April 14 Hilary Mason Guest Lecture 4 presentations Final Paper Due
April 19: 4 presentations April 21 L9: Data and Ethics; Discussion 4 presentations
April 26: 6 presentations April 28 Paper Pecha-Kucha
Fran Berman, Data and Society, CSCI 4967/6963
Lecture 3: Data and Health
Fran Berman, Data and Society, CSCI 4967/6963
Information technologies have revolutionized the Health Sciences • Electronic Medical Records
– Greater data accessibility, analysis, potential for effective use
• Better disease diagnosis and treatment
– Disease modeling and analysis, IT-guided surgery, electronic monitoring, etc.
• Better understanding of biological structure and function and fundamental science
– Mapping of the human genome, characterization of biological function and disease trajectory
• Personalized / precision medicine
– Monitoring, early prediction of risk and diagnosis, customized treatment
• Public health
– Better prediction and response of epidemics
– Better characterization of health risk and mitigation
• Etc., etc., etc.
Fran Berman, Data and Society, CSCI 4967/6963
• What genes are associated with cancer?
• What parts of the brain are responsible for Alzheimer’s?
• How do coral reefs evolve over time?
• Who is at greatest risk for sickle cell anemia?
• …
Health Sciences Research: Data a fundamental driver for greater discovery
Integration across multiple scales in the biosciences
Disciplinary
Databases Users
Data
Access
and Use
Data
Integration
Organisms
Organs
Cells
Atoms
Bio-
polymers
Organelles
Cell Biology
Anatomy
Physiology
Proteomics
Medicinal
Chemistry
Genomics
Image courtesy of Mark Miller
Fran Berman, Data and Society, CSCI 4967/6963
Potential of data-driven discovery coupled with many technical and societal challenges
Regulation and policy
• Who owns health information? Who has a right to see it?
• Can health information be used to make non-health decisions (insurance premiums, job eligibility)?
• What regulations should apply to mobile health applications?
Culture and practice
• How should clinicians team effectively with technology for better diagnosis, treatment and care?
• How and by whom should personalized health information be used? How do we protect the privacy of patients while promoting the public good?
• When should health-related information be a “competitive advantage”?
Fran Berman, Data and Society, CSCI 4967/6963
Potential of data-driven discovery coupled with many technical and societal challenges
Ethics
• Under what conditions is stem cell research, cloning, euthanasia, health monitoring, etc. acceptable? Does the availability of relevant data change this?
• Who is responsible for a misleading / incorrect data-driven diagnosis?
• How should sensitive data be used?
Enabling Infrastructure
• How do we make information most useful?
• How should we support data access and sharing?
• How can we ensure reproducibility of data-driven results?
Fran Berman, Data and Society, CSCI 4967/6963
Lecture Outline: Data-driven efforts in the Health Sciences
• Data and Community: PDB
• Data and Government: NIH BD2K
• Data and the Individual: Precision Medicine
Fran Berman, Data and Society, CSCI 4967/6963
Data and Community: The Protein Data Bank
Some information in this section courtesy of Phil Bourne and Helen Berman
Fran Berman, Data and Society, CSCI 4967/6963
What is the Protein Data Bank (PDB)?
• International repository and archival database for information about the 3D structure of large biological molecules (such as proteins and nucleic acids)
• Most major scientific journals, and some funding agencies (including the NIH) now require scientists to submit their structure data to the PDB.
• Provides free worldwide public access 24/7 to accurate protein data
Fran Berman, Data and Society, CSCI 4967/6963
About PDB
• PDB data downloaded > 500,000,000 times as of 2016
• PDB supports the development of standards for the representation, annotation, and validation of these structural data that are collected from different experimental methods.
• World-wide PDB (wwPDB.org) is a consortium of groups that host deposition, annotation, and distribution centers for PDB data and collaborate on PDB projects: – RCSB [Research Collaboratory for Structural Bioinformatics] PDB (US: Rutgers
and SDSC/UCSD)
– PDB Europe (PDBe, UK)
– PDB Japan
– BioMagResBank (US)
• U.S. RCSB team includes computer scientists, biologists, chemists, educators.
Fran Berman, Data and Society, CSCI 4967/6963
Outcomes: Broad use, new innovation and discovery
PDB has enabled
• Safe storage of protein data
• Molecular replacement models for structure determination
• “Parts list” for modeling
• Structure based drug design
• Protein structure classification
• Protein structure prediction
Fran Berman, Data and Society, CSCI 4967/6963
PDB community ~325,000 unique users per month from ~190 countries [2016]
Fran Berman, Data and Society, CSCI 4967/6963 From the 2013 PDB Annual Report http://www.rcsb.org/pdb/general_information/news_publications/annual_reports/annual_report_year_2013.pdf
Global, multi-sector resource: 113,000+ PDB searchable structures in 2016
Fran Berman, Data and Society, CSCI 4967/6963
(Electron Microscopy)
(Nuclear Magnetic Resonance)
From http://ac.els-cdn.com/S0959440X1630077X/1-s2.0-S0959440X1630077X-main.pdf?_tid=49d80f9c-d0f1-11e6-949c-00000aab0f6b&acdnat=1483364722_a5b81cad4469cedae4b5b5f293ce4913
Fran Berman, Data and Society, CSCI 4967/6963
Considerable human and technology infrastructure needed: Deposition and Annotation Pipelines
From http://ac.els-cdn.com/S0959440X1630077X/1-s2.0-S0959440X1630077X-main.pdf?_tid=49d80f9c-d0f1-11e6-949c-00000aab0f6b&acdnat=1483364722_a5b81cad4469cedae4b5b5f293ce4913
Fran Berman, Data and Society, CSCI 4967/6963
PDB History
1970s • Community discussions about
how to establish an archive of protein structures
• Cold Spring Harbor meeting in protein crystallography
• PDB established at Brookhaven (Oct 1971; 7 structures)
1980s • Number of structures increases
as technology improves • Community discussions about
requiring depositions • IUCr guidelines established • Number of structures deposited
increases
1990s • Structural genomics begins • PDB moves to RCSB PDB
2000s • WWPDB formed • 50,000th structure released
(April 2008)
2010’s • 40th Anniversary of PDB (2011) • 10th Anniversary of WWPDB (2013) • 2013 / rcsb.org:
– ~286,000 unique visitors per month from 190 countries
– 1,000,000 downloads of data from PDB archive per day
– 1.3 TB per month transferred – 10,000 downloads of mobile app
Information courtesy of Helen Berman and 2013 PDB Annual Report
Fran Berman, Data and Society, CSCI 4967/6963
PDB and Data Sharing
“ A very important factor in the growth of the PDB has been the change in attitudes
regarding data sharing.
In 1971, the incentives to deposit data in the PDB were very practical: by putting data
in the archive, depositors would ensure that the data would not get lost. The task of
distributing data resident on magnetic tapes to interested parties located around the
world became the job of the PDB. In spite of these conveniences, it was not the norm
to deposit data.
It wasn't until the 1980s that several community groups began to establish
guidelines for data sharing. Once published, the funding agencies and the journals
began to adopt these guidelines.
Today, structure deposition into the PDB is a prerequisite for publication in virtually
every journal. These scientific, technological, and cultural changes have driven the
continual growth of the PDB.”
The Future of the Protein Data Bank
From: “The Future of the Protein Data Bank” http://onlinelibrary.wiley.com/doi/10.1002/bip.22132/full
Fran Berman, Data and Society, CSCI 4967/6963
PDB Business Model
• Funding models for PDB vary for each PDB center in WWPDB
– Different funding cycles
– Different funding criteria
– Multiple agencies involved
• Costs
– Cost for structures 2013 is roughly $750M (~$75K/structure)
– Cost for archiving is roughly $10,000,000
• Sustainability: Multiple models explored
– Current models: multiple-agency, multi national funds
– Potential models: • Journal model – pay per structure
• Congressional appropriation (NCBI)
• International funding structure with strong community oversight, rolling tenure
– Charitable WWPDB Foundation being explored to support education, outreach, continued collaboration with respect to standards, and community meetings
Wellcome Trust, EU,
CCP4, BBSRC, MRC,
EMBL
BIRD-JST,
MEXT
NSF, NIGMS, DOE, NLM,
NCI, NINDS, NIDDK
NLM Information courtesy of Helen Berman and Phil Bourne.
Fran Berman, Data and Society, CSCI 4967/6963
Data and Government: NIH Biomedical Data Initiatives
• Open Science and Open Data
• BD2K
• ADDS and NLM
• U.S. Precision Medicine Initiative
Fran Berman, Data and Society, CSCI 4967/6963
Open Science: NIH’s PubMed and PubMed Central (“open papers”)
• PubMed is a search engine provided free to the public by NIH (National Library of Medicine).
– PubMed searches primarily the MEDLINE DB of references and abstracts on life sciences and biomed research publications
– Some but not all PubMed searches include full papers
– Some PubMed articles hosted in PubMed Central
• PubMed Central is a free digital repository that archives publicly accessible full-text scholarly articles.
• NIH Public Access Policy: Papers on research funded by the National Institutes of Health must be available to the public for free through PubMed Central within 12 months of publication.
Fran Berman, Data and Society, CSCI 4967/6963
Open Data: NIH successfully used its “bully pulpit” to drive broader biomedical community data sharing
Fran Berman, Data and Society, CSCI 4967/6963
Big Data to Knowledge (BD2K) Initiative • Focus: Development of innovative
and transforming approaches as well as tools for making Big Data and data science a more prominent component of biomedical research
• Broad development of underlying data ecosystem for biomedical researchers:
– Standards, methods, tools, SW, new algorithms and approaches to further biomedical research
– Access to shareable biomedical data through technologies, approaches and policies that enable and facilitate data sharing, discoverability, management, curation and meaningful re-use
– Workforce development and training to support data-driven biomedical research
Fran Berman, Data and Society, CSCI 4967/6963
Focused BD2K Funding
• Targeted SW development: – Funding for SW and methods development that impact data management,
data compression, data provenance, data visualization, data privacy, metadata, etc.
• Resource Indexing – Focus on data and resource discovery, citation and access for data and
software for biomedical researchers.
• Enhancing Training – Workforce training and education. Focus is on development of materials,
MOOCS, short courses, career development, training coordination center.
• Centers of Excellence – 11 Centers of Excellence for Big Data Computing and one for Data
Coordination and Integration. Centers develop approaches, SW tools, resources and provide training to advance data-driven biomedical research.
Fran Berman, Data and Society, CSCI 4967/6963
Data Science at NIH
• 2014 - 2017: NIH created position of Associate Director of Data Science (ADDS) and created budget to support data science at NIH
• Key foci of “ADDS” Office included:
– Sustainability of digital assets
– Training of data-savvy biomedical workforce
– Innovation in the biomedical data science ecosystem
– Refining of NIH internal processes to make the most of data and data technologies
– Communication within and outside of the NIH
• ADDS projects included BD2K projects, development of an NIH data commons, internal NIH initiatives, speaker series, etc.
• Dr. Phil Bourne served as ADDS office Director.
Fran Berman, Data and Society, CSCI 4967/6963
National Library of Medicine Strategic Planning
• 2017: National Library of Medicine now continuing / evolving ADDS work in data and data science.
• Focus of current strategic planning is increase impact in the:
– Role of NLM in advancing biomedical discovery and translational science
– Role of NLM in supporting the public’s health (e.g. clinical systems, public health systems and services, personal health)
– Role of NLM in advancing data science, open science, and biomedical informatics
– Role of NLM in building collections to support discovery and health in the 21st century
Relevant topics:
• Standards
• Computing
• Physical plant
• Research
• Education
• Workforce
• User communities
• Partnerships
• International engagement
• Public health / health disparities, etc.
Fran Berman, Data and Society, CSCI 4967/6963
U.S. Precision Medicine Initiative
• Announced in 2015 State of the Union
• Involves NIH, NCI, FDA, Veterans’s Affairs, DoD, and other govt. agencies
• Launched with $215M investment in 2016 budget
• More info at https://www.whitehouse.gov/precision-medicine
Fran Berman, Data and Society, CSCI 4967/6963
Precision Medicine
• Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in environment, lifestyle and genes for each person.
• Precision medicine not new – transfusion patients matched with donors according to blood types, etc., but new sources of data, cost-efficiency of sequencing genomic data, and potential for more refined and successful treatments driving huge expansion of the area.
• Precision medicine powered by patient data. All inherent potential and challenges with data involved:
– Effective infrastructure and workflows
– Privacy and access
– Interoperability
– Accuracy and interpretation
Fran Berman, Data and Society, CSCI 4967/6963
Data and the Patient -- Larry Smarr at TedMed (16 min)
http://www.tedmed.com/talks/show?id=18018
Fran Berman, Data and Society, CSCI 4967/6963
Lecture Materials
• 2013 Protein Data Bank Annual Report http://www.rcsb.org/pdb/general_information/news_publications/annual_reports/annual_report_year_2013.pdf
• RCSB Protein Data Bank: A Resource for Chemical, Biochemical, and Structural Explorations of Large and Small Biomolecules, Journal of Chemical Education, http://pubs.acs.org/doi/pdf/10.1021/acs.jchemed.5b00404
• The archiving and dissemination of biological structure data, ScienceDirect, http://ac.els-cdn.com/S0959440X1630077X/1-s2.0-S0959440X1630077X-main.pdf?_tid=49d80f9c-d0f1-11e6-949c-00000aab0f6b&acdnat=1483364722_a5b81cad4469cedae4b5b5f293ce4913
• The Protein Data Bank, Nucleic Acids Research, 2000 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC102472/.
• The Future of the Protein Data Bank, Biopolymers, volume 99, http://onlinelibrary.wiley.com/doi/10.1002/bip.22132/full
• “The Protein Data Bank and lessons in Data Management,” Briefings in Bioinformatics, http://www.sdsc.edu/pb/papers/briefings04.pdf
• PDB: www.rcsb.org
• NIH websites: https://www.nih.gov/, https://datascience.nih.gov/bd2k, https://www.nlm.nih.gov/pubs/plan/strategic_planning.html
• Precision Medicine, Nature, http://www.nature.com/nature/journal/v537/n7619_supp/full/537S49a.html
• Precision Medicine Initiative, White House, https://www.whitehouse.gov/precision-medicine
Fran Berman, Data and Society, CSCI 4967/6963
Discussion
• “I had my DNA Picture Taken with Varying Results”, New York Times, http://www.nytimes.com/2013/12/31/science/i-had-my-dna-picture-taken-with-varying-results.html
Fran Berman, Data and Society, CSCI 4967/6963
Presentation Articles February 10
• February 10: (data and science)
– “Crowdsourcing: For the Birds”, NY Times, http://www.nytimes.com/2013/08/20/science/earth/crowdsourcing-for-the-birds.html?pagewanted=2&contentCollection=Science&action=click®ion=EndOfArticle&module=RelatedCoverage&pgtype=article [Dan S]
– “Digital Keys for Unlocking the Humanities’ Riches”, New York Times, http://www.nytimes.com/2010/11/17/arts/17digital.html?pagewanted=all&_r=0 [Bobby M]
– “African Elephant Numbers Plummet 30 Percent, Landmark Study Finds,” National Geographic, http://news.nationalgeographic.com/2016/08/wildlife-african-elephants-population-decrease-great-elephant-census/ [Deborah A]
– “Astronomers Characterize Wolf 1061 Planetary System“, Sci News, http://www.sci-news.com/astronomy/wolf-1061-planetary-system-04552.html [Eric L]
Fran Berman, Data and Society, CSCI 4967/6963
Presentation Articles for February 15
• February 15 (data and science)
– “Big Data: Astronomical or Genomical?” PLOS Biology, http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195 [Patrick C]
– “Big data shows how what we buy affects endangered species,” Science Daily, https://www.sciencedaily.com/releases/2017/01/170104103604.htm [Priyanka K]
– “Neuroscience: Big brain, big data”, Scientific American, https://www.scientificamerican.com/article/neuroscience-big-brain-big-data/ [Stephen N]
– “The Australian Square Kilometre Array Pathfinder Finally Hits the Big Data Highway”, Phys. Org., https://phys.org/news/2017-01-australian-square-kilometre-array-pathfinder.html [James N]
– “Eat, Sleep, Repeat: Crowdsourcing the data of a baby’s typical day”, Discover, http://blogs.discovermagazine.com/citizen-science-salon/2016/12/14/eat-sleep-repeat-crowdsourcing-the-data-of-a-babys-typical-day/#.WHL2wFMrKpo [Aditya S]
‒ “Computer Science Technique Helps Astronomers Explore the Universe” Inside Science, https://www.insidescience.org/news/computer-science-technique-helps-astronomers-explore-universe [Brandon T]
Fran Berman, Data and Society, CSCI 4967/6963
Presentation Articles February 17
• February 17 (data and cyberinfrastructure)
– “Innovation in Asthma Research: Using Ethnography to Study a Global Health Problem”, Ethnography Matters, http://ethnographymatters.net/blog/2012/10/27/the-asthma-files/ [Lee C]
– “Biology: the big challenges of big data”, Nature, http://www.nature.com/nature/journal/v498/n7453/full/498255a.html [Grigory A]
– “Precision Medicine in the million genome era”, Genetic Engineering and Biotechnology News, http://www.genengnews.com/gen-articles/precision-medicine-research-in-the-million-genome-era/5944 [Noah W]
– “New Technologies Bring Marine Archaeology Treasures to Light”, The Guardian, https://www.theguardian.com/science/2016/dec/29/new-technologies-bring-marine-archaeology-treasures-to-light [Rachel K]
Fran Berman, Data and Society, CSCI 4967/6963
No class Wednesday. Class next Friday.
• Next Friday: Data and Science ; Presentations
• No Discussion Article. Research paper instructions next Friday
Fran Berman, Data and Society, CSCI 4967/6963
Break
Fran Berman, Data and Society, CSCI 4967/6963
Presentations
Fran Berman, Data and Society, CSCI 4967/6963
Presentation Articles for February 3
• February 3: (data and health)
– “The 21st Century Cures Act: FDA Reforms Aim to Spur Innovation the Pharmaceutical, Medial Device and Health Research Sectors”, Lexology, http://www.lexology.com/library/detail.aspx?g=fa622c15-f2c4-4397-9d64-6cd341fcaf3f [Andrea L]
– “Four Steps to Precision Public Health”, Nature, http://www.nature.com/news/four-steps-to-precision-public-health-1.21089 [Kusuma B]
‒ “Can IBM’s Watson do it all?”, Fast Company, https://www.fastcompany.com/3065339/mind-and-machine/can-ibms-watson-do-it-all [Erica B]
‒ “mHealth’s Year in Review: From Texting to Wearables to Telehealth’s Tricks (and Treats)”, mHealth Intelligence, http://mhealthintelligence.com/news/mhealths-year-in-review-from-texting-to-wearables-to-telehealths-tricks-and [Tim T]