Big Data and Health Care

Post on 14-Jul-2015

179 views 0 download

Tags:

Transcript of Big Data and Health Care

BIG DATA DIGITAL HEALTH

REVOLUTIONAlex A0135681

Henri A0135487

Zheng A0121892

Pham A0095804

Yin A0119974

Kavitha A0110143

For information on other technologies, see http://www.slideshare.net/Funk98/presentations

HAVE YOU EVER

VISITED A DOCTOR?

ONE SIZE FITS ALL

ONE SIZE FITS ALL

FOOD FOR THOUGHTS

FOOD FOR THOUGHTS

40,000

+PATIENTS DIE IN US

EACH

BIG DATA DIGITAL HEALTH

REVOLUTION

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

TODAY

IN FUTURE

SENSORS TODAY

iBGStar

iHealth wireless

pulse oximeter

Jawbon

e Withings smart body

analyser

iBGStar

iHealth wireless

pulse oximeter

Jawbon

e Withings smart body

analyser

CALORI

ES

EATING

HABITS

SLEEP

BODY

TEMPERATUR

E

HEART

RATE

BLOOD

SUGAR

SENSORS TODAY

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

TODAY

IN FUTURE

DETECTIO

N

ANALYSIS DIAGNOSTIC

S

CELL

CULTURE

DRUG

DELIVERY

THERAPEUTI

CS

SENSORS IN FUTURE

Continuous MicroCHIPS

Glucose MonitoringGoogle lens

MIT batteryless power

source

Parathyroid

hormone

microchip injection

SENSORS IN FUTURE

Sensor-Laden

Transdermal patch

SENSORS IN FUTURE - BioMEMS and Microsystems

SIZE

POWER

COMMUNICATIO

N

SENSORS IN FUTURE - Micro supercapacitors

Laser-scribed graphene micro-supercapacitors

SENSORS IN FUTURE - Reduction in MOSFET size

SENSORS IN FUTURE - External communication

SENSORS IN FUTURE - The trend in shrinking sells

SENSORS IN FUTURE - BioMEMS and Microsystems

● Size decrease

● Better and smaller communication chips and algorithms

● micro supercapacitors

● This will facilitate the arrival of these new implantable chips

● Allows for non bothersome personal medicine

● Allow for more tailored medicine

● It will require more data analysis and more processing power

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Introduction

SSD vs HDD

Data Protection

The Storage Medium used is

of More focus than the

Quantity of Storage used. It is

no longer one-size-fits-all

“Data Deluge” is

Fundamentally Changing the

way that Storage is

Approached.

HARDWAREIntroduction

● Provide Real-time Or Near Real-time

Responses.

● Handle Huge Data Volumes Growing Rapidly

Key Characteristics of Big Data Infrastructure:

● High processing/IOPS performance

● Very Large Capacity.

HARDWAREWhat’s Key to Efficient Data Processing?

KEY DIFFERENTIATOR

● Big Data is Largely Unstructured.

● Unstructured Data is Immutable

● Traditional File Systems have Built-in Functions to handle Insert/Update.

● Creates a Lot of Overhead in Terms of Performance, IOs Required to

Access Data and the Ability to Scale

HARDWAREWHY DO WE NEED A DIFFERENT APPROACH?

FIG. GROWTH OF UNSTRUCTURED DATA ANNUALLY

● Objects in one Large, Scalable Pool of Storage

● Stores metadata – Information about the

object

● An Object ID is stored, to Locate the Data

● Objects are immutable

● No File System Hierarchy

Products:

● Scality’s RING architecture

● Dell DX

● EMC’s Atmos

HARDWAREOBJECT STORAGE – Choice of Storage

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Introduction

SSD vs HDD

Data Protection

● Access Times

SSDs exhibit Virtually no Access

time

● Random I/O Performance of SSDSSD Delivers at least 6000 IO/Sec

15 times faster than HDD(400

IO/S)

● Reliability

SSDs 4-10 Times more Reliable

HARDWAREStorage Medium Solid-State Drive (SSD) or Hard Disk

Drive(HDD)

SSD

HDD

REAL TIME APPLICATIONS OF SSD

● Read-Intensive Video-on-demand(VOD), and Image-Retrieval

Applications.

● Emerging Applications (Big Data/Hadoop/Cloud)

HARDWARECOMPARISON OF BOOT TIMES USING SSD & HDD

2011Throughput 250 MB/s , Capacity 512GB

2014:1000 MB/s Data Transfer , Capacity 4TB

Standard 2.5 inch form factor

Further Scale Down of Flash

Lithography

Leads to Continued Performance Gains

and Greater Capacity Points.

HARDWARESolid-State drives SSDs & Moore’s Law

Fig 1.HDD Aerial Density follow Moore’s

Law

Fig2. Avg. Price Comparison of SSD Vs.

HDD

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Introduction

SSD vs HDD

Data Protection

HARDWAREDATA PROTECTION – WHY DIFFER FROM TRADITIONAL

APPROACHES?

RAID (REDUNDANT ARRAY OF INDEPENDENT DISKS)● Originally Designed for Small Capacity Disks.

● Longer Time taken to Restore a Failed Drive as Capacity Increase.

● To Shorten Longer Rebuild cycles, RAID Systems Ship with Faster Processors,

Leading to High Energy Consumption.

REPLICATION

● Copies Add Additional Costs: Typically 133% or more Additional Storage is

needed for each Additional Copy

● Storage System will get More Expensive as the amount of Data Increases

HARDWARELimitations of Traditional Approaches

How Does it Work?

● Information Dispersal Algorithms (IDAs)

separate data into Unrecognizable slices of

information.

● It is then dispersed to Storage Nodes in

disparate Storage locations.

● It can be implemented Locally or

Distributed .

● Only a Pre-defined subset of the slices From

the Dispersed Storage Nodes is needed to fully

Retrieve all of the Data.

HARDWAREInformation Dispersal - Better Approach?

● It is Resilient against Natural disasters or Technological failures, like

Drive failures, System Crashes and Network Failures.

● Data can still be Accessed in Real-time even if there are Multiple

Simultaneous Failures across a String of Hosting Devices, Servers or

Networks

● Five 9’s or More are Guaranteed with Overhead Low as 20% - As

Opposed To 3 Copies Requiring 200% Overhead.

HARDWAREBenefits of Information Dispersal

HARDWARECost Savings from IDA in Petabyte Storage over RAID and

Replication

When looking at Number of Years without Data loss, with a 99.99999% Confidence Level,

Information Dispersal doesn’t even appear on the Chart because even For a Large storage amount

like 524K Terabytes, the Confidence for Years without data loss is not within anyone’s

lifetime.(Theoretically Over 79 Million Years.)

HARDWARECost Savings from IDA in Petabyte Storage over RAID and

Replication

When looking at Number of Years without Data loss, with a 99.99999% Confidence Level,

Information Dispersal doesn’t even appear on the Chart because even For a Large storage amount

like 524K Terabytes, the Confidence for Years without data loss is not within anyone’s

lifetime.(Theoretically Over 79 Million Years.)

HARDWARECost Savings from IDA in Petabyte Storage over RAID and

Replication

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Deal with huge

data

Machine learning

How make the huge dataset to match the ICD 10?

ALGORITHMSDeal with the huge data

ICD 10 Clinical

Modifications

69823

ICD CM Dataset • 3-7 characters

• Character 1 is alpha

• Character 2 is numeric

• Character 3-7 can be alpha

or numeric

ICD 10 Procedure

Coding System

76000

ICD 10 PCS Dataset • 7 characters

• Each one can be alpha or

numeric

• Numbers 0-9; letters A-H, J-

N, P-Z

ALGORITHMSICD 10 introduction

Analytics Algorithms

Machine Learning

Image Retrieval system

Huge Nonstandard

Data Source (4V)

Data Feature Selection

Huge multiple

characters mapping

databases

Data Analytics

Volume

Velocity

Variety

Veracity

ALGORITHMSWhy we need big data

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Deal with huge data

Machine learning

Diagnosis is a relatively straightforward

machine learning problem. Clinical

decision making is highly suited for

rule-based systems because of the

nature of the data, such as ICD-10

codes, medications, etc.,

ALGORITHMSMachine Learning in medical diagnosis

ALGORITHMSPopular Imaging Modalities in Healthcare Domain

ALGORITHMSMedical Image Retrieval System

*ImageCLEF medical – competition on Medical Image Processing

Two main tasks:● Image–based retrieval● Case–based retrieval

source : http://www.imageclef.org/

# of images

ALGORITHMSDatabase of ImageCLEF Data Medical

competition

• This is the classic medical retrieval task.

• Similar to Query by Image Example.

• Given the query image, find the most similar images.

http://www.imageclef.org/

# performance

ALGORITHMSImage base retrieval Algorithm

Performance = Difficulty * Accuracy

# of images Mean average

precision

• This is a more complex

task; is closer to the

clinical workflow.

• A case description, with

patient demographics,

limited symptoms and test

results including imaging

studies, is provided (but

not the final diagnosis).

• The goal is to retrieve

cases including images

that might best suit the

provided case description.

http://www.imageclef.org/

ALGORITHMSCase-based retrieval Algorithm

Speed Slow Fast

Accuracy Hard to keep Precision

Level to study Quite hard Easy to learn

Solution level Shallow Deep

Machine

Learning

NO YES

Result Hard to explain Perspective visualization

ALGORITHMSManual Calculate VS Software and Algorithm

CONTENT

DATA

COLLECTION

SENSORS

DATA

PROCESSING

HARDWARE

DATA

ANALYZING

ALGORITHMS

SUSTAINABLE HEALTHCARE

SYSTEM

Technological

fusion

TECHNOLOGICAL FUSION

BioMEMS Hardware Object Storage

Information Dispersal

Machine Learning

More data can be

gathered to identify

patterns and

interactions

Doctors will use for

diagnosis and decision-

making

Health care costs will

decrease

Individual patient care

will improve

TECHNOLOGICAL FUSION CONCLUSION

THANK YOUQ&A