NGS Informatics and Interpretation - Hardware Considerations by Michael McManus

26
© 2014 Knome, Inc. NGS Data Hardware Requirements and Considerations Presenter: Michael J. McManus, PhD, SVP of Operations Date: September 26, 2014

Transcript of NGS Informatics and Interpretation - Hardware Considerations by Michael McManus

© 2014 Knome, Inc.!

NGS Data Hardware Requirements and Considerations!

Presenter: Michael J. McManus, PhD, SVP of Operations!Date: September 26, 2014!

© 2014 Knome, Inc.!

Questions!

If you have any questions during the webinar, please enter them in the GoToWebinar pane. We will answer as many as possible at the end.

© 2014 Knome, Inc.!

[Poll]!

© 2014 Knome, Inc.!

During this webinar we will discuss four questions:"

1.  Why purchase hardware when you can process NGS data on the cloud?!

2.  What sort of hardware should be considered?!

3.  What hardware specifications are needed for conducting align + call versus interpretation?!

4.  How do I compare systems apples-to-apples?!

?

© 2014 Knome, Inc.!

NGS informatics and interpretation infrastructure!

Flexible, fast bioinformatics

Comprehensive, customizable

annotation

2  

Indication-specific filtering, prioritization, and

interpretation

3  

Align! Call! Annotate! Filter! Report!Classify!

Bioinformaticians & Technologists

Geneticists, Clinicians, & Genetic Counselors

1  

© 2014 Knome, Inc.!

Why internal vs. using the cloud? !

§  Knome’s customers have expressed a strong preference for an internally installed solution over a cloud solution. Why? !!

§  Three reasons:!1. Security!2. Software Version Control!3. File Transfer Time!

© 2014 Knome, Inc.!

During this webinar we will discuss four questions:"

1.  Why purchase hardware when you can process NGS data on the cloud?!

2.  What sort of hardware should be considered?!

3.  What hardware specifications are needed for conducting align + call versus interpretation?!

4.  How do I compare systems apples-to-apples?!

?

© 2014 Knome, Inc.!

What type of hardware should be considered?!

§  To process NGS data you need to understand many issues:!

© 2014 Knome, Inc.!

Elements for NGS informatics !

1. Compute!•  Multiple nodes!•  Grid Computing!!

2. Database!!

3. Storage!•  Shared File System!

4.  Networks!•  Storage!•  Communications!•  File upload/download!!

5. Software!•  Operating System!•  Virtualization!•  Open Source Tools!•  Web Server!

Five elements must be balanced:"

© 2014 Knome, Inc.!

Application node"

Grid node"

Database node"

                     Data nodes"

File System Manager"

knoSYS state diagram - node view!

© 2014 Knome, Inc.!

Shared File System!

§  All files are stored in one place, not on separate nodes!§  Failure tolerance is a requirement!–  RAID 6 protection is required !–  A minimum of 2 drive failures should be tolerated!–  One “hot spare” should be provided per array!–  Good array reliability rates (>90%)!

§  Performance is a key need!–  A file system that supports “striping” files across the storage array is

desired!–  A file system that gets faster as more disks are added to the storage array.!–  A minimum of a 1 Gigabyte per second of sustained I/O rate!

© 2014 Knome, Inc.!

During this webinar we will discuss four questions:"

1.  Why purchase hardware when you can process NGS data on the cloud?!

2.  What sort of hardware should be considered?!

3.  What hardware specifications are needed for conducting align + call versus interpretation?!

4.  How do I compare systems apples-to-apples?!

?

© 2014 Knome, Inc.!

What hardware is needed for align/call vs. interpretation? !

§  Aligning & Calling:"– Aligning starts with a FASTQ, produces a BAM!– Calling takes the BAM and produces a VCF!– These processes require large amounts of RAM, disk

space, and CPU cores!

§  Interpretation:"– Starts with a VCF file!– The annotation and interpretation processes also benefit

from ample amounts of RAM, disk space, and CPU cores, but can be done with far less. !

© 2014 Knome, Inc.!

§  End-to-end: reads to report!

§  Flexible, fast, secure!

§  Supports a multi-disciplinary team!

§  Ideal for translational and clinical labs!

§  Multiple configuration options !

The knoSYS® system overview!

hardware

k100    

© 2014 Knome, Inc.!

k100 model – for align/call, whole genomes!

§  The knoSYS k100 model will efficiently process large numbers of whole genomes and exomes. !

© 2014 Knome, Inc.!

k25 model – for interpretation!

§  The knoSYS k25 model is designed to efficiently process panels, as well as smaller volumes of genomes and exomes.!

© 2014 Knome, Inc.!

Specs and Throughput!

k25 Specs"

Server" # Nodes" CPU" #

CPU"#

Cores"RAM (GB)"

Storage (TB)"

1 GbE + card"

10GbE card" IB" UPS"

Compute" 1! E5-2640v2! 2! 16! 256! -! Yes! Yes! No!No!Database" -! -! -! -! -! -! -! -! -!

Storage" -! -! -! -! -! 24! -! -! -!Total" 1" -" 2" 16" 256" 24" -" -" -" -"

k25 Monthly Throughput"

 " FASTQ" VCF-Only"

Sequence Type" Align/Call" Annotation"

Genomes (37x)! 12! 360!

Exomes (100x)! 54! 3,240!

Panels (300x )! 720! 16,200!

k100 Monthly Throughput"

 " FASTQ" VCF-Only"

Sequence Type" Align/Call" Annotation"

Genomes (37x)! 60! 1,440!

Exomes (100x)! 270! 12,960!

Panels (300x )! 3,600! 64,800!

k100 Specs"

Server" # Nodes" CPU" #

CPU"#

Cores"RAM (GB)"

Storage (TB)"

1 GbE + switch"

10 GbE card"

IB + switch" UPS"

Compute" 4! E5-2560v2! 8! 64! 512! 16!Yes!

Yes!Yes! Yes!Database" 1! E5-2640v2! 2! 16! 128! 4! No!

Storage" 3! E5-2609! 3! 18! 48! 60! No!Total" 8" -" 13" 98" 688" 80" -" -" -" -"

© 2014 Knome, Inc.!

Lustre® Shared File System for the k100!

§  Two configurations:!–  60TB and 180TB!•  60 TB has 1 SSU!•  180TB has 1 SSU and 2 ESUs!

§  Specs:!–  RAID 6 configuration!–  20 x 4TB drives, plus 1 x 4TB hot spare !

for each SSU and each ESU!–  Max I/O !•  60TB array ≈ 2.5 GB/sec!•  180TB array ≈ 7.0 GB/sec!•  Matches Infiniband peak I/O rate of 7GB/sec!

–  Array Reliability of 96.6%!

knoSYS k100 ClusterStor 1+0 TOTAL = 80TB / Usable = 60TB (4U)

SSU 0

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

knoSYS k100 ClusterStor 1+2 TOTAL = 240TB / Usable = 180TB (12U)

SSU 0

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

ESU 1

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

ESU 2

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

OST

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

Storage" Parity"

© 2014 Knome, Inc.!

RAID File System for the k25!

§  One configuration!–  24TB usable / 32TB raw!

§  Specs:!–  RAID 6 configuration!–  8 x 4TB drives!•  6 x 4TB drives for storage!

–  Max I/O !•  ≈ 900MB/sec!

–  Array Reliability of 94.3%!

4TB 4TB 4TB 4TB 4TB 4TB 4TB 4TB

Storage" Parity"

© 2014 Knome, Inc.!

During this webinar we will discuss four questions:"

1.  Why purchase hardware when you can process NGS data on the cloud?!

2.  What sort of hardware should be considered?!

3.  What hardware specifications are needed for conducting align + call versus interpretation?!

4.  How do I compare systems apples-to-apples?!

?

© 2014 Knome, Inc.!

How do I compare systems apples-to-apples?!

§  All hardware sounds similar, but the benefit of the Knome solution is in:!!1. The unique combination

of the various hardware elements!!

2. The price-performance that Knome provides for its solution!!

§  5 Elements:!!– Compute!– Database!– Storage!– Network!– Software!

© 2014 Knome, Inc.!

knoSYS architecture – hardware!

QDR/FDR Infiniband Switch"

ClusterStor Management Unit"

Back-up Power Supply"

Scalable Storage unit"30TB or 60TB usable"

Gigabit Ethernet Switch"NETWORK" TRAFFI"C

STORAGE" TRAFFI"C

High Performance Computing Server"

Expanded Storage Unit 2"30TB or 60TB usable"

Expanded Storage Unit 1"30TB or 60TB usable"

Database Server"

•  GRID NODES (3) to align, call, annotate, compare genomes, exomes, and panels"

•  APPLICATION NODE (1) for web-based GUI"

•  Switch to manage and direct storage traffic"

•  Switch to manage and direct network traffic"

•  ClusterStor Management Unit – Houses Metadata Server (MDS) and Management Server (MGS)"

•  Scalable Storage Unit (SSU) for a SHARED FILE SYSTEM for storage of genomes, exomes, panels; projects, analyses, etc."

•  BACK-UP POWER in case of power failure"•  CONDITIONS incoming power to prevent spikes/dips"

•  Expanded Storage Unit (ESU) to add more capacity.Can use 2TB, 3TB or 4TB drives "

•  RDMS for managing storage of projects, sequences, etc. PostgreSQL running on Lustre FS"

•  Expanded Storage Unit (ESU) to add more capacity. Can use 2TB, 3TB or 4TB drives. "

© 2014 Knome, Inc.!

knoSYS elements for NGS informatics - solution!

Component" Model k100" Model k25"

Compute and Database"

Compute nodes ! 4 physical nodes, (3 compute nodes, !1 application node)!

1 physical node with 3 virtual nodes !(2 compute nodes, 1 application node)!

Grid Computing! Open Grid Engine / Open Grid Scheduler!Database ! PostgreSQL node (physical)! PostgreSQL node (virtual)!

Storage"Shared File System! Lustre! RAID 6 disk array!

Network  Storage Network! QDR/FDR Infiniband ! No network, uses SAS!Communications Network!

1Gb/s Ethernet for server-to-server communication!10Gb/s Ethernet for file uploading and downloading!

Software"Web Server! Tomcat (server-side), Java and Chrome (client-side)!Operating System! CentOS 6.3 or higher!Virtualization! N/A! VMWare vSphere ESXi!Open Source Tools! Many open source tools!

© 2014 Knome, Inc.!

Conclusions!

§  The cloud has great potential, but for today’s genomics needs, the focus is on an in-house solution!

§  There is more to the decision than hardware alone. You need to consider the hardware and software when making your decision!

§  There are many questions to be answered before you can decide on your hardware purchase!

§  Hardware is fairly similar, but there are methods to combine hardware elements to maximize performance, but at a reasonable price. !

hardware

    ?k100  

 

© 2014 Knome, Inc.!

§  A recording of this webinar and the slides will be available on our website on Monday.!

What’s Next?!

www.knome.com twitter.com/knome [email protected] facebook.com/knomeinc linkedin.com/company/knome-inc 617-715-1000

© 2014 Knome, Inc.!

Questions!

If you have any questions during the webinar, please enter them in the GoToWebinar pane. We will answer as many as possible at the end.