Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

17
Personalized Medicine Big Data “IT” in Health and Life Sciences Ketan Paranjape Worldwide Director, Health and Life Sciences Intel Corp. 1 www.intel.com/healthcare/bigdata

description

Talk at Stanford Big Data in Biomedicine Conference by Ketan Paranjape, WW Director Health and Life Sciences

Transcript of Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Page 1: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Personalized

MedicineBig Data “IT” in Health and Life Sciences

Ketan ParanjapeWorldwide Director, Health and Life SciencesIntel Corp.

1

www.intel.com/healthcare/bigdata

Page 2: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Personalized Medicine = Big Data Analytics in Health and Life Sciences

Page 3: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Life Sciences :: Key Industry Challenges and Solutions

• Many (most) applications are single-threaded, single address space

Intel is delivering optimizations working with open source community, developing NGS+HPC curriculum

• Some algorithms scale quadratically with the size of the problem. Large data sets exceed available memory and storage

Innovations in acceleration, compute, storage, networking, security, and *-as-a-service.

• International collaboration is an imperative, bioinformatics expertise is scarce

• Intel is working closely with the ecosystem to address enterprise to cloud transmission of terabyte payloads

• Databases are distributed, data is siloed and will likely stay that way

Tools like Hadoop, Lustre, Graphlab, In-Memory Analytics, Security etc.

Need for Balanced Compute Infrastructure

*Other names and brands may be claimed as the property of others.

Page 4: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Recent Collaborations

4

Page 5: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Profiling: Single Instance Run of GATKGATK: Genome Analysis Toolkit

• # of Machines = 1• # of cores/Machine = 24• Temporary Storage – RAID0 2x4TB HDD• Input Dataset: G15512.HCC1954.1, coverage: 65x

Average CPU utilization is very low. Most cores not being usedAverage I/O bandwidth is very low. Application not I/O bound

Average memory footprint is small. Application not using memory available in newer systems

There is a lot of room to improve*Other names and brands may be claimed as the property of others.

Page 6: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Improvements in GATK 3.0• Pair HMM Acceleration using Intel® AVX

resulted in 970x speedup

− Computation kernel and bottleneck in GATK Haplotype Caller

− AVX enables 8 floating point SIMD operations in parallel

6*Other names and brands may be claimed as the property of others.

Page 7: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Applications and Workloads Optimized on Intel Architecture

• Focus on improving genomics, molecular dynamics pipelines

• Optimize individual applications (node and cluster); Work with code authors to release optimizations

DOMAIN ApplicationsIntel® Architecture

Target

Genomics

Bowtie 1*, Bowtie 2* Xeon® processor

BWA* Xeon® processor

BLAST* Xeon® processor

GATK* Xeon® processor

HMMER*Xeon® processor

Xeon® Phi™ coprocessor

Abyss* Xeon® processor

Velvet* Xeon® processor

*Other names and brands may be claimed as the property of others.

DOMAIN ApplicationsIntel® Architecture

Targets

MolecularDynamics/Chemistry

AMBER*

Xeon® processorXeon® Phi™ coprocessor

NAMD*

GROMACS*

GAMESS*

Quantum Espresso*

Gaussian*

VASP*

CP2K*

QBOX*

CPMD*

LAMMPS*

Page 8: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

• Challenge: Ayasdi Cure™ analyzes highly complex, large data sets and relies on fast computation times to provide real-time output.

• Solution:

− Intel® AVX instructions - four double-precision floating-point operations in parallel vs. one.

− Intel® MKL library - accelerate filter computations

• Benefits: 400% performance increase in distance computation.

Scripps DNA Sequencing Pipeline

• Challenge: Processing times, Logistical Delays, Cluster complexity

• Solution: Intel® Xeon® E7-4800 series using SSDs

• Benefits: ~4x Improvement on processing times

8

4x

*Other names and brands may be claimed as the property of others.

Page 9: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Ultra High-Speed Networking Optimizations

• Challenge: Improving big data transfer to and from the backend data center

• Solution:

− Optimize ultra high-speed (10+ Gbps) data transfer solutions built on Aspera’s FASP ™ technology

− Intel® Xeon® E5-2600 (DDIO, SR-IOV)

• Benefits:

− 300% improvement in transfer throughput

− Physical or virtual, LAN or WAN – same transfer speeds

High Performance Scale-out Storage Challenge:

• Challenge: 10-15TB data added weekly, small fraction of overall storage capacity and need a system to scale, be flexible and efficient

• Solution: HPC-class storage, powered by Intel®

Enterprise Edition for Lustre* software

• Benefits:

− Openess, global namespace

− Performance of upwards of 1 TB/s

− Virtually unlimited file system and per file sizes, and management simplicity

9*Other names and brands may be claimed as the property of others.

Page 10: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

HPC Appliances for Life Sciences• Challenge: Experiment processing takes 7 days with current infrastructure.

Delays treatment for sick patients

• Solution: Dell Next Generation Sequencing Appliance

− Single Rack Solution; 9 Teraflops, Lustre File Storage; Intel SW tools

• Benefits: RNA-Seq processing reduced to 4 hour

• Includes everything you need for NGS - compute, storage, software, networking, infrastructure, installation, deployment, training, service & support

Dell HSS (Lustre)(up to 360TB)

Dell NSS (NFS)(up to 180TB)

Infrastructure: Dell PE, PC & F10

M420 (Compute)(up to 32 nodes)

2U Plenum

Actual placement in racks may vary.

NSS-HA Pair

NSS User Data

HSS Metadata Pair

HSS OSS Pair

HSS User Data

** 2-socket Intel(R) Xeon(R) CPU E5-2687W / 3.1 GHz

*Other names and brands may be claimed as the property of others. *Other names and brands may be claimed as the property of others.

Page 11: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Genomics & Clinical Analytics Appliances

11

2U Plenum

Actual placement in racks may vary.

NSS-HA Pair

NSS User Data

HSS Metadata Pair

HSS OSS Pair

HSS User Data

*Other names and brands may be claimed as the property of others.

Page 12: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Value

• Enable researchers to discover biomarkers and drug targets by correlating genomic data sets

• 90% gain in throughput; 6X data compression

Analytics

• Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers)

• Provide APIs for applications to combine and analyze public and private data sets

• Hive + Hadoop for query/search; Intel® Xeon® + 10 GbE

Genomics Data Discovery using Hadoop

*Other names and brands may be claimed as the property of others.

Page 13: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Charite “Real-time” Cancer Analysis – Matching proper therapies to patients using in-memory techniques

• Challenge: Real-time analysis of cancer patients using in-memory SAP HANA Oncolyzer database running on Intel® Xeon® family infrastructure. (3.5M Data points per Patient, Up to 20 TB of data/patient)

• Solution: Using structured and unstructured data to collect and analyze tables used to take up to two days -- now takes seconds

• Benefits: Improves medical quality in disruptive way for Patient, Doctor, Hospital, Research

http://moss.ger.ith.intel.com/sites/SAP/SAP%20account%20team%20documents/Marketing/SAP%20HANA/SAPHANA_Charite_case_study_HI.PDF *Other names and brands may be claimed as the property of others.

Page 14: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

High Throughput Science: Embracing Cloud-based Analytics for Computational Chemistry Simulation

• Challenge: Sustaining 50000+ compute cores for large scale simulations, for less than a week; CapEX v. OpX

• Solution: Novartis leveraged software from AWS partner, Cycle Computing, and MolSoft to provision a fully secured cluster of 30,000 CPUs, powered by the Intel® Xeon® processor E5 family.

− Completed screening of 3.2 million compounds in approximately 9 hrs, compared to 4 -14 days on existing resources.

Powerof60.com*Other names and brands may be claimed as the property of others.

Page 15: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Regional Health Information Network, RHINChina (Jinzhou, Pop 3M)

• Challenge: RHIN has challenges with

scalability, performance and maintenance. Data storage is expensive

• Solution: EMR data and healthcare

services running on Intel HadoopDistribution and Xeon E5 servers.

• Benefits: High performance and

scalability demonstrated via POC and stress testing. Significantly reduced storage cost

• 1/5 Reduction in Response Time; 5x Concurrent Users

Data processing flow of RHIN platform

http://hadoop.intel.com/pdfs/IntelChinaHealthyCityAnalyticsCaseStudy.pdf*Other names and brands may be claimed as the property of others.

Page 16: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Policy – United States, European UnionSnapshot of US, EU Recommendations

Develop an ICT-enabled European Strategy for Personalised

Medicine

2014-2020

Driving research to unleash the potential of ICT at the point-of-care

EU R&D initiatives must address:

Interoperability of technical standards for managing and sharing sequence data in research and clinical samples;

Development of hardware, software and workflow algorithms to accelerate cost efficient analysis of genetic abnormalities that cause cancer and other complex diseases;

Research to ensure convergence of Big Data and Cloud Computing infrastructure to meet the requirements of High Performance Computing and data throughout the life sciences and healthcare value chains

The eHealth Action Plan 2020 should include Personalised Medicine as a

priority

Gain knowledge of the challenges and barriers (technical, organizational, legal and political) to the adoption of ICT in support of Personalised Medicine leveraged by genomic information;

Evaluate how to change workflows and education requirements to facilitate adoption of ICT mediated personalized medicine in clinical practice;

Expand collaboration with other regions of the world in matters of common interest, e.g. by leveraging the eHealth MoU with the United States of America;

Study, evaluate and disseminate technology neutral risk assessment frameworks for data privacy and security, covering the entire ICT enabled Personalised Medicine delivery chain;

Develop effective methods for enabling the use of medical information for public health and research

Page 17: Intel life sciences_personalizedmedicine_stanford biomed 052214 dist

Health & Life Sciences at IntelWhere information and care meet

Let us all make Personalized Medicine

mainstream by 2020 ..

..You focus on the Science, let us focus on

“IT”

• www.intel.com/healthcare/bigdata

[email protected]