Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

41
HOW DO WE HANDLE MANY CORE SYSTEM ON CHIP? Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University

Transcript of Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Page 1: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

HOW DO WE HANDLE MANY CORE SYSTEM ON CHIP?

Rabi MahapatraDepartment of Computer Science & Engineering

Texas A&M University

Page 2: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

What is Many Core SoC?

Embedded Systems and Codesign Laboratory

Lot of Cores on a Board A Single Chip

Many Core SoC have hundreds of IP cores on a single chip.

Multi Core SoC have a handful of IP cores on a single chip.

Jason Lee
We can mention a few examples if necessary.. Tilera, etc.
Page 3: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

• Performance Demand is ever increasing

• Frequency is not increasing

• More transistors available on single die– 50 billion transistors Soon!

Why do we need many core SoC?

1990 1995 2000 2005 20100

1000200030004000

Core Frequency (GHz)

Embedded Systems and Codesign Laboratory

1970 1980 1990 2000 2010 20201000

10000010000000

1000000000100000000000

Transistors on Chip (Millions)

Page 4: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Issues in Many Core SoC

• Power and Thermal Management• Testing• Operating System Design• Modeling and Benchmarks• Programming Model• Memory Bandwidth• Fault Tolerance• Virtualization Support

Embedded System and Codesign Laboratory

Page 5: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Key Research Challenges

Embedded Systems and Codesign Laboratory

Jason Lee
Divide into challenges our group researches, and other important challenges?Other challenges: debug, visibility, software development
Page 6: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Power & Energy

Challenge #1

Page 7: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Power Issue in many core SoC

Core Power

Communication

Power Supply

Clock

Embedded Systems and Codesign Laboratory

Communication Accounts for

Significant Power

35%

Various Levels of Power Management

Page 8: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Many Core SoC with Networks on Chip (NoC)

• As the number of cores per chip scales, on-chip busses will no longer meet performance needs

• Route packets not wires• NoC Components

– routers– core-network interfaces (CNI)– links

Embedded Systems and Codesign Laboratory

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

CNI

R

IP IP IP IP

IP IP IP IP

IP IP IP IP

IP IP IP IP

An Example NoC with Mesh Topology

Page 9: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Network on Chip for Low Power Communication

What NoC Has to Offer?

• High Bandwidth• Less sensitive to Wire

Delay• Versatile Infrastructure• Scalable Communication

Challenges

• Router Architecture? • Power Management? • Routing Algorithm• Quality of Service

Embedded Systems and Codesign Laboratory

Page 10: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Low Power Router Architecture

• Routers are primary component in NoC

• Buffers consume the most power in Router

Embedded Systems and Codesign Laboratory

R R

R R

CPU 1

CPU 2MEM 1

MEM 2

Up to 79% Power Consumption

Efficient Management of Buffer can save power

Page 11: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

How to Reduce Buffer Power?

• Reduce Active Buffers: Dynamic Buffer Management – Buffers are organized in

blocks– Flows are monitored– Excess blocks are powered

down based on traffic flow

• Use Energy Efficient Storage Encoding– SRAMs can be efficient buffer

solution– Storing 0 ≠ Storing 1– Encode the bits in packet to

preserve energy

Embedded Systems and Codesign Laboratory

20% Energy Savings can be achieved!

Page 12: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

System Level Dynamic Power Management

Dynamic Peak Power Budget Satisfaction

• Local Power Consumption is computed

• Neighbors Power is shared• Non deterministic algorithm is

used to calculate available budget

Embedded Systems and Codesign Laboratory

25% Performance Improvement

Page 13: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

System Level Dynamic Power Management

Intelligent Power budget Distribution

• Ant System inspired power budget distribution approach

• Power ants are sent from surplus region

• Beggar ants are sent from starving region

• Power is shared from surplus to starving region

Embedded Systems and Codesign Laboratory

SinkSource

20% Improvement in Power Budget Utilization

Sharing Path

Page 14: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Reliability Aware Low Power Scheduling

• Pfair is an optimal scheduling algorithm for multiprocessor task scheduling

• Integrated DVFS (Dynamic voltage and frequency scaling) into the Pfair scheduling algorithm

• Feedback controller based allocation of additional job copies to manage reliability.

Embedded Systems and Codesign Laboratory

Task set

DVS enabled

Pfair Scheduler

Feedback control

Observed Reliability

Additional job copies

• Reduced Failure rate compared to Pfair

• Up to 50% savings in energy

Page 15: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Low Power Real Time Scheduling in Hardware

• Implemented the Pfair scheduling algorithm (for MPSoC) in hardware

• Transformed floating point computations to integer domain.

Embedded Systems and Codesign Laboratory

P1 P2 P3 P4

P5 P6 P7 P8

ContextSwitch

TaskRunning

inte

rrup

t

Hardware Pfair

scheduler

Load Task

Transfer Control

Save Context

Exponential savings in Energy consumption

compared to software based scheduler

Page 16: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Temperature Aware Scheduling

Temperature aware energy management (TA-DVS) at run time using novel slack reclamation

Embedded Systems and Codesign Laboratory

Task set EDF Scheduler

Feedback control

Temperature Estimation

Slack

Temperature constrained slack

Page 17: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Power Management: Summary

What We Addressed

• A Low Power Router Buffer Architecture

• Peak Power Management Heuristic

• Intelligent Dynamic Power Budget Distribution

• Reliability and Temperature aware task scheduling

Other Research Challenges

• Novel flow control to reduce power consumption further

• Context/Application aware power management

• System wide power policy management

Embedded Systems and Codesign Laboratory

Page 18: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Test and Reliability

Challenge #2

Page 19: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Many Core SoC Reliability Overview

• Challenges– Shrinking feature sizes– Power density and

temperature

• Advantages– NoC can be used as

test delivery platform– Redundancy

Many Core SoC present us with challenges and advantages in achieving reliable computing

Reduced operational lifetime: the “bathtub” curve is getting shallower and narrower

Embedded Systems and Codesign Laboratory

Enhanced testing is necessary to meet application reliability requirements

Graceful degradationAdaptable

architectures

Page 20: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

How to Improve Reliability? - Test Infrastructure IP (TI-IP)

• Many Core SoC will contain:– Processing, Memory, and I/O IP– Infrastructure IP

• For Debug, Yield, and Testing (TI-IP)

• TI-IP provides reliability and availability– No longer necessary to take

chip off-line to test– Allows for fast, high-coverage

testing– Already being deployed in

commercial automotive SoC (Freescale MPC564xL)

Embedded Systems and Codesign Laboratory

NoC

TestI-IP

TestI-IP

CNI

CNI

CNI

CNI

CNI

IPCore

IPCore

IPCore

IPCore

IPCore

BIST

BIST

CNI

CNI

Page 21: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Distributed TI-IP – The Solution for Many Core SoC

• TI-IP becomes limited by scalability for Many Core– Many TI-IPs distributed

across the SoC

• Each SoC Tile can have TI-IP and Test Vectors– Test vectors can be

optimally divided across SoC based on NoC topology

– This solves the problem of testing deeply embedded cores

Embedded Systems and Codesign Laboratory

TI-IP composed of:• Test Controller• Test Vector Memory

Test Vector Set

1 2 3 4 5

5x5 2D-Torus Example

Page 22: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Test Vector Storage: The Benefits

• Experimental Analysis: – Observe time to test SoC with and without distributed TI-IP– Measured over a variety of SoC sizes for scalability– Test time independent of SoC size for distributed TI-IP

Embedded Systems and Codesign Laboratory

5x5 8x8 10x10 13x130

50100150200250300350

Distributed TI-IPSingle-Source TI-IP

2D-torus size

Late

ncy

(µs)

85% reduction in test time

94% reduction in test time

Page 23: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

SoC Test and Reliability: Summary

What We Addressed

• TI-IP: On-Line Testing• Test Vector Storage• Test Scheduling

Open Problems

• Life Time Reliability?• Diagnosis & Recovery?• Fault Resilience?

Embedded Systems and Codesign Laboratory

Page 24: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Security

Challenge #3

Page 25: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Security in Many Core SoC

Embedded Systems and Codesign Laboratory

Page 26: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Why Security is important in SoC?

• Weakness in SoC– Heterogeneous system: IP from different vendors– Tradeoff between security and performance– Decrease visibility and control – Improved attack techniques

• Sample Attacks – Denial of Service (DoS)– Bandwidth reduction– Draining or Sleep Deprivation– Extraction of secret info– Hijacking of programmable components

Page 27: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

State of the Art in SoC Security

• Divide the system into secure and unsecure areas– Secure area (ASIC)– Unsecure area (FPGA etc)

• Secure Bus Design– Extend conventional arbitration using Trojan Detection

and Access Control

• Security IP Based Solution– Transaction Monitoring– Sandboxing

Embedded Systems and Codesign Laboratory

Page 28: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

IP Based SoC Security Approach

• Central Security Core• Dedicated communication

channel for security protocol• Secure agents at each Core• Distributed Response protocol

is necessary

Embedded Systems and Codesign Laboratory

Page 29: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Immune System Inspired Attack Response

Embedded Systems and Codesign Laboratory

Page 30: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

SoC Security: Summary

State of Art

• Bio Inspired Modeling of Threat Response Mechanism

• Security Monitor Core• Core Level Anomaly

Detector

Open Problems

• Attack Model Development

• Security Infrastructure Development

• Protocol Design• Design of Security

Core/IP

Embedded Systems and Codesign Laboratory

Page 31: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Development Platform

Challenge #4

Page 32: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Need for a Development Platform

• Major road block in advent of many core SoC

• Need better platform simulator and debugger

• Rapid development cycle• Suitable Benchmarks to

effectively evaluate many core SoC

Embedded Systems and Codesign Laboratory

Page 33: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

The NoCBench Platform

Full System SoC Simulation Platform

Built using SystemCGeneric and

Extensible

Embedded Systems and Codesign Laboratory

XML REPRESENTATION

NETWORK GENERATION

CO

MP

ON

EN

T

LIB

RA

RY

MANUALCONFIGURATION

CO

RE

L

IBR

AR

Y

GRAPH

SYSTEMCSIMULATION

BENCHMARKS

OK?

REPORT

DONE

YES

NO

DESIGN

Simulation Engine

Page 34: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

SoC Model in NoCBench

NoCBench System Model Components of NoCBench

• System Kernel– Provides scheduling– Task management

• Core Library– Processor cores– Memory core– Other IP

• Network On Chip– NoC backbone with routers

and CNIs

Embedded Systems and Codesign Laboratory

Page 35: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

NoCBench Configuration and Results

Configuration Parameters

• Network on Chip– Router Details– Topology– Injection Limit– Power Model

• System– Scheduler– Task Configurations– Core Types

Reported Metrics

• Network– Throughput– Latency– Power

• Application– Execution Time– Cycles– Power

Embedded Systems and Codesign Laboratory

Page 36: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Virtual Platform Using Carbon Design Tools

VPNoC

• Scalable architecture• 16-100 cores support• Run applications using

traces• Suitable for data analysis

accelerators• Can meet performance,

power, protocol and security analysis

Challenges

• Slow when using large number of nodes

• Fast Model is essential

Embedded Systems and Codesign Laboratory

Page 37: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

What About Benchmarking?

• Benchmarking is a Challenge– Application Benchmark– Communication Benchmark– Large optimization problems– Social media data analysis– Many other large data analytic problems

• What kind of setup– How future “many core SoC” will look like

Embedded Systems and Codesign Laboratory

Page 38: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Development Platform: Summary

What We Have

• VPNoC• SoC simulation• Micro Kernel• Simple Scheduler• Basic core library• Limited IPC

What Do We Need

• More core support• Thread Library Support• Application Benchmarks

Embedded Systems and Codesign Laboratory

Page 39: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Conclusion

Page 40: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Lot to Explore!

Embedded Systems and Codesign Laboratory

• Many Core SoC is the future• Challenges

– Performance & Power– Test, Reliability & Security– Benchmarking

• More Information– http://codesign.cs.tamu.edu/index.php/research/soc-a

nd-noc– http://codesign.cs.tamu.edu/index.php/research/real-ti

me-systems

Page 41: Rabi Mahapatra Department of Computer Science & Engineering Texas A&M University.

Thank You!

Embedded Systems and Codesign Laboratory