IBM Research Data Centric Systems - ICPP 2014icpp.cs.umn.edu/agerwala.pdf · © 2014 IBM...

IBM Research

Data Centric Systems The Next Paradigm in Computing

Dr. Tilak Agerwala VP, Data Centric Systems, IBM Research Dr. Michael Perrone, Research Staff Member

IBM Research

Principle 2: Time-to-market

Principle 3: Communication is critical

Principle 6: High Availability

Principle 7: Single-System Image Flexibility

Principle 4: Standard UNIX

Principle 5: High-performance services

Principle 1: “Ride the technology curve”

SP Design Principles & Impact STOCKPILE STEWARDSHIP

§  Government –  Science Based Stockpile Stewardship (SBSS, 1994) –  Dramatic new level of simulation accuracy

§  Industry –  Drove parallel database adaption: DB2, SAP, Oracle –  Aerospace, Automotive, Chemistry, Database,

Electronics, Finance, Geophysics, Information Processing, Manufacturing, Mechanics, Pharmaceuticals, Telecom, Transportation, etc.

IBM Research

The Motivation for Parallelism: Power Savings

Parallel Serial

Amdahl’s Law Acceleration by frequency scaling

Acceleration by parallelism

Total run time

P =CV 2 f P = cf α α > 2

P = NP0

If the parallel section is large enough, it is more power efficient to use parallelism.

TSerial +TParallelN

%&−1

Speed up factor:

Total time: T = TSerial +TParallel

IBM Research

Blue Gene Design Principles – Optimized for power efficiency Principle 1: Trade clock speed for lower power consumption

Principle 3: Focus on network performance

Principle 5: Application and hardware Co-Design

Principle 4: Reduce OS jitter

Principle 2: Use integration to lower power AWARDS § Top500 § Green500 § Graph500

IBM Research Data-Centric Systems: Application Domains

Business Intelligence

Climate & Environment

Life Sciences

Watson Health Care Analytics

Technical Computing

Business Analytics Social Analytics

DOD All-Source Intelligence

Financial Analytics

Oil and Gas Science

DOE NNSA and Office of Science

Key Domain Characteristics: Big Data, Complex Analytics, Scale and Time to Solution Requirements Overlapping Requirements in HPC and HPA enable an converged solution

System G Big Insights

Integrated Wide Azimuth Imaging and Interpretation

Production Weather

Engineering Design, Prototyping, Analysis,

Optimization

Integrated Trading and VaR

DCS = HPC + HPA = HPE (High Performance Environments)

Modeling, Simulation

Complex analytics

IBM Research

DCS Workflows: Mixed compute capabilities required

All Source Analy0cs

Oil and Gas

Science

Financial Analy0cs

4-‐40 Racks

2-‐20 Racks

1-‐5 Racks

Labs: 100-‐400 Racks Univ: 1-‐20 Racks

Analytics

Reservoir

Graph Analytics

Throughput

Seismic

Value At Risk

Image Analysis

Capability

Massively Parallel

Compute Capability

•  Simple kernels •  Ops dominated (e.g.

DGEMM, Linpack) •  Simple data access

patterns •  Can be

preplanned for high performance

Analytics Capability

•  Complex code •  Data Dependent

Code Paths / Computation

•  Lots of indirection / pointer chasing

•  Often Memory System Latency Dependent

•  C++ templated codes

•  Limited opportunity for vectorization

•  Limited scalability •  Limited threading

opportunity

IBM Research

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15"

FR"="2"

FR"="3"

FR"="4"

FR"="5"

FR"="6"

FR"="7"

FR"="8"

FR"="9"

FR"="10" 1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15"

FR"="2"

FR"="3"

FR"="4"

FR"="5"

FR"="6"

FR"="7"

FR"="8"

FR"="9"

FR"="10"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15"

FR"="2"

FR"="3"

FR"="4"

FR"="5"

FR"="6"

FR"="7"

FR"="8"

FR"="9"

FR"="10" 1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15"

FR"="2"

FR"="3"

FR"="4"

FR"="5"

FR"="6"

FR"="7"

FR"="8"

FR"="9"

FR"="10"

§  Optimal system design depends on frequencies and Serial/Parallel (S:P) split

§  Today static – Tomorrow dynamic

Heterogeneity Is Important: Power Per Unit Speed Up Factor

S:P = 1:99 S:P = 1:9

S:P = 1:999 S:P = 1:9999

log2(N) log2(N)

N = # of weak cores / # of strong cores FR = Strong core frequency / Weak core

frequency

IBM Research

IBM Data-Centric Design Principles

Principle 3: Modularity – Balanced, composable architecture for Big

Data analytics, modeling and simulation – Modular and upgradeable design,

scalable from sub rack to 100’s of racks

Principle 2: Enable compute in all levels of the systems hierarchy

–  Introduce “active” system elements, including network, memory, storage, etc.

– HW & SW innovations to support / enable compute in data

Principle 4: Application-driven design – Use real workloads/workflows to drive

design points – Co-design for customer value

Principle 1: Minimize data motion – Data motion is expensive – Hardware and software to support &

enable compute in data –  Allow workloads to run where they run best

Principle 5: Leverage OpenPOWER to accelerate innovation and broaden diversity for clients

IBM Research

Data CentricSystems – Systems Built Around Data

§  Integration of massive data management and compute with complex analytics

§  Optimized workflow components (compute and dataflow) across the system

§  Data centric systems move computation to the data

Main Memory

Storage Class

Memory

Storage

Processor

Processing in Memory

Data-Centric Computing

IBM Research Data Centric System Design: Addresses Latency!

Main Memory

Storage

Processor

Traditional Computing Silicon Technology, Frequency Scaling

Si Tech + Parallelism Amdahls Law, Density Scaling

Data Centric Computing Si Tech + Parallel + Systems

IBM Research

MISSION: The OpenPOWER Consortium’s mission is to create an open ecosystem, using the POWER Architecture to share expertise, investment and validated and compliant server-class IP to serve the evolving needs of customers.

–  Opening the architecture to give the industry the ability to innovate across the full Hardware and Software stack

•  Includes SOC design, Bus Specifications, Reference Designs, FW OS and Hypervisor Open Source

–  Driving an expansion of enterprise class Hardware and Software stack for the data center

–  Building a vibrant and mutually beneficial ecosystem for POWER

POWER CPU

Tesla GPU

Example:

9 Gold Members

Platinum Members

OpenPOWER Foundation

16 Silver Members

IBM Research

Welcoming new members in all areas of the ecosystem 100+ inquiries and numerous active dialogues underway

35 members and groing

Boards / Systems

I/O / Storage / Acceleration

Chip / SOC

System / Software / Services

Implementation / HPC / Research

Building collaboration and innovation at all levels

IBM Research

Data Centric Systems: Activities § Co-design

–  Optimize system capability, trading off within constraints, e.g., power, cost, etc. –  Arrive at system design points that are driven by real workflows

§ System Architecture –  Heterogeneous nodes and memory, e.g., near-memory processing, accelerators, etc. –  Active Communications / Processing-in-Network to reduce software path length and data

movement –  Active Storage: Low latency storage model for working set and efficient check pointing –  Continuous workload rebalancing and optimization

§ Resilience § System-wide power management § Software § Performance

IBM Research

Power Efficiency § Need significant improvement over what we can get from technology alone

§ Workflow efficiency – Remapping workflows to data centric elements – Data motion is expensive – Cost/Performance benefits

§ Architectural efficiency – Increase workflow parallelism to leverage low-power cores

§ Engineering efficiency – Improved dynamic power management

•  Power only what’s being used •  Vary voltage dynamically

– Minimize power losses •  New power device technology, power conversion techniques and dense packaging

– E.g., Reduce electrical current conversion loss from 30% (today) to 10% (future) 14

IBM Research

Resilience

§ Need 10-100x improvement in fault resilience

§ Fault detection – Expose all hardware faults

•  Spend more transistors on error detection •  “Silent errors” – e.g., Cosmic ray in a multiplier is expensive to protect against

§ Fault handling options – Hardware faults recover in hardware (e.g., Error Correction Code) – Recover in software

•  e.g., reset to a previous checkpoint –  Identify “don't care” states

•  <1:10 of the time data was not used an fault was irrelevant •  E.g., unused portions of cachelines & pages; stale variables, etc.

IBM Research

Systems Software Stack

§ Workflow driven data-centric execution model – Computation occurring at different levels of the memory and storage hierarchy – Compute, data and communication equal partners – Late binding to heterogeneous hardware element – Dynamic optimization: Increasingly automated and self-optimizing – Hardware support for productivity

§ Programming model – Encompass all aspects of the data and computation management – Enable new system functionality while minimizing the impact on programmers

MPI, OpenMP and OpenACC extensions – Co-existence with lower level programming models

IBM Research

▪ SYSTEMS ▪ Consistent formal data/system/execution objects & abstractions for efficient reasoning about

the system ▪ Systems API’s for Power Management, Active networks, Active storage, Active memory,

Continuous workload rebalancing and optimization

▪ PROGRAMMING MODELS AND RUNTIMES ▪ Heterogeneous massively multithreaded model ▪ Enable peer-to-peer heterogeneous distributed compute ▪ Late binding of 100’s of millions of threads on millions of elements ▪ Dynamic management of time-varying ensembles of workloads

▪ RESILIENCE ▪ Full transparency and instrumentation to handle software errors ▪ Anomalous pattern detection ▪ API’s for Resilience

Some Research Areas

IBM Research

The Future

§ A time of significant disruption – industries are digitizing aggressively - Data is emerging as the “critical” natural resource of this century.

§ Data is joining theory, practice and computation to drive discovery in research and industrial / commercial impact.

§  Integrating compute with data from multiple sources will drive enormous innovation over the next decade!

§  We must address the data explosion and make efficient data management our number one design parameter

§ The Era of Cognitive Supercomputers

§  Quantify the uncertainty associated with the behavior of complex systems-of-systems and predict outcomes

§  Learn and refine underlying models based on constant monitoring and past outcomes

§  Accommodate “what if” questions in real-time

§  Provide real-time interactive visualization

IBM Research

IBM Research Data Centric Systems - ICPP 2014icpp.cs.umn.edu/agerwala.pdf · © 2014 IBM...

Documents

Transcript of IBM Research Data Centric Systems - ICPP 2014icpp.cs.umn.edu/agerwala.pdf · © 2014 IBM...

Itransition Channel Partner Program (ICPP)

ICPP 2016 Program Draft VIII - philopraxis.ch · ICPP 2016 – Program Draft VIII July 2016 Here you find the name index and at page 2 the table of contents . The program begins at

© 2006 IBM Corporation IBM Business Centric SOA Event SOA on your terms and our expertise Practical Experiences : Delivering the value from SOA Paul Campbell-Kelly.

© 2005 IBM Corporation IBM Business-Centric SOA Event SOA on your terms and our expertise Operational Efficiency Achieved through People and SOA Martin.

A Presentation for the Enterprise Architect © 2006 IBM Corporation Business Centric SOA Event SOA Governance.

IBM Research - uniroma1.itdegiacom/didattica/software-services/aa2009-10... · IBM Research © 2009 IBM Corporation Process-Centric Approach ! Business Data is –NOT the primary

Interpreting the Case of the Refugees ICPP 2015

A Data-Centric Approach to Synchronization · 4 A Data-Centric Approach to Synchronization JULIAN DOLBY, IBM T.J. Watson Research Center CHRISTIAN HAMMER, Saarland University DANIEL

IBM SmartCloudpublic.dhe.ibm.com/software/dw/cloud/techtalks/IBM... · Data-Centric Security Is An Issue BIG DATA, GLOBAL COMPLIANCE, CLOUD ADOPTION, DATA BREACHES 1. Global State

Service innovation in product-centric firms: a …706638/...Consider IBM, a product-centric firm frequently used as an example of a manufacturer that has transitioned to service provision:

IBM DataCenter Expert Conference 2014 IBM Client Center ... · PDF fileStakeholders per Appl. / APKG Server-Centric Process Status Tracking Old Infrastructure ... 1 V fU 2 Mv Sub p

Energy Aware Network Computing - icpp-conf.org

true user-centric identity management · 2006-12-05 · true user-centric identity management Dr Jan Camenisch Technical Leader PRIME Project Leader Cryptography IBM Research Zurich

© 2009 IBM Corporation Georgia Tech What is an Activity? Appropriating an Activity-Centric System Svetlana Yarosh Georgia Institute of Technology, Georgia,

Scheduling End-to-end with z-centric Capabilities€¦ · IBM ®Tivoli Workload Scheduler for z/OS Scheduling End-to-end with z-centric Capabilities describes how to set up IBM Tivoli

© 2006 IBM Corporation IBM Business-Centric SOA Event SOA in an Enterprise Architecture Richard Whyte IT Integration Architect MBCS, CITP.

Evolution of HPC to Data Centric Systems€¦ · 02/02/2016 · IBM, Mellanox, and NVIDIA awarded $325M U.S. Department of Energy’s CORAL Supercomputers IBM & UK’s STFC Partner

IBM Software Group · Analytics – Generate insight on master data Other MDM vendors focus on the symptom (the data) and deliver data-centric tools. IBM is the only vendor who delivers

Rack Centric - High Voltage Distribution - IBM WWW Page-+malik+-+high+voltage+distribution+bus+rev… · Data Center Application Rack Centric - High Voltage Distribution Randy Malik

© 2007 IBM Corporation User Focused :: Business Centric Innovation That Matters Web 2.0: the next generation of Internet capabilities Christopher Perrien,