Dme presentation-dec2012-rev13-1

35
Elsip Adam Edström CEO Bengt Edlund Sales Director December 2012 © Elsip 2012. Elsip non-confidential

Transcript of Dme presentation-dec2012-rev13-1

Page 1: Dme presentation-dec2012-rev13-1

Elsip

Adam Edström CEO

Bengt Edlund Sales Director

December 2012

© Elsip 2012. Elsip non-confidential

Page 2: Dme presentation-dec2012-rev13-1

Needs and Solutions

Page 3: Dme presentation-dec2012-rev13-1

Our Sweet Spot

Software Defined Data Management

for Many-core SoC Designs

Page 4: Dme presentation-dec2012-rev13-1

Why we're needed

Many-core

Reconfigurability

Complexity

Page 5: Dme presentation-dec2012-rev13-1

Many-core instead of MHz

•Clock frequencies don’t rise anymore.

Figure shows the clock frequencies of processors presented at ISSCC between 1993 and 2011. After a long period of steady increase the top frequency has leveled off since 2005, at 2-3 Ghz. Source: ISSCC, 2011.

Page 6: Dme presentation-dec2012-rev13-1

Entering the Many-core Era

More parallelism is the only way to higher performance.Sequential programs limits multithreading to ~10 instructions per cycle.Higher degrees of parallelism have to be extracted at the process and the application levels.=> Hundreds of cores in a few years!

Page 7: Dme presentation-dec2012-rev13-1

Microprocessor Challenges

1980 1990 2000 2010 2020

100.000.000

10.000.000

1.000.000

100.000

10.000

1.000

100

10

1

Performance

Memory Bandwidth Programming Scaling

Serial Challenges Parallel

Early days20%/Year

Single CoreTornado50%/Year

Multi Core days20%/Year

Saturation10%/Year

Many Core Days25%/Year More than

100x improvement

10-100xImprovement3D SoC

memory

Page 8: Dme presentation-dec2012-rev13-1

Makimoto extended

Page 9: Dme presentation-dec2012-rev13-1

Reconfigurability

Products are increasingly defined as flexible platforms. Standardization is pushed by the fact that future products will include more embedded processing, more communication and more interconnect.

=> Heterogeneous IC architectures, with flexible reconfigurable processing cores, and interface components configurable for standardized communication and interaction protocols.

Page 10: Dme presentation-dec2012-rev13-1

DesignComplexity

Std single core CPU with cache and main memory

Multi-core with distributed shared cache with common main memory

Homogeneous many-core with distributed shared cache and main memory

Heterogeneous many-core with distributed shared cache and main memory

High

Low

Computational energy efficiency

Architecture complexity

Operation s/sec/Joul e

Page 11: Dme presentation-dec2012-rev13-1

DesignComplexity

Std single core CPU with cache and main memory

Multi-core with distributed shared cache with common main memory

Homogeneous many-core with distributed shared cache and main memory

Heterogeneous many-core with distributed shared cache and main memory

High

Low

3D Stack Die

Computational energy efficiency

Architecture complexity

Operation s/sec/Joul e

Page 12: Dme presentation-dec2012-rev13-1

DesignComplexity

Std single core CPU with cache and main memory

Multi-core with distributed shared cache with common main memory

Homogeneous many-core with distributed shared cache and main memory

Heterogeneous many-core with distributed shared cache and main memory

High

Low

3D Stack Die

Computational energy efficiency

Architecture complexity

Elsip's target market

Operation s/sec/Joul e

Page 13: Dme presentation-dec2012-rev13-1

Memory architecture matters

Beyond a certain level of parallelization, any gain in computation time is offset by the overhead of memory access and synchronization.For the matrix and FFT operations this means that the performance in a 64 node central memory architecture is in fact lower than on 16 nodes.The performance advantage of DSM increases with the number of cores

Performance of multi-core architectures with centralized and distributed memory organization. Both use a cache, so the observed difference is only due to the delay in accessing uncached data. Source: Elsip.

Page 14: Dme presentation-dec2012-rev13-1

Distributed memory needs

A distributed memory architecture needs a data management mechanism supporting:

Distributed memory accessFlexible private/shared memory space managementSynchronization for memory consistencyVirtual address space managementScaleabilityFlexibilityTransaction ordering (Memory consistency)Data movement (DMA) functionsMessage passingCache coherence

Page 15: Dme presentation-dec2012-rev13-1

Elsip's DME – Data Management Engine – is a microprogrammable IP block for on-chip data management. Microprograms in the DME realize the different data management functions. The microprograms can also be downloaded dynamically, giving applications flexibility to adapt the DME to specific needs.

For higher performance and/or power critical applications the DME can be hard coded (replaced by a state machine)

=> The DME is a software defined MPSoC data management IP block

Introducing DME

Page 16: Dme presentation-dec2012-rev13-1

Applications

The DME is useful for many-core SoCs in: Video, signal and network processing Cloud computing Industrial automation Set-top boxes Scientific computing Solid state disks High-end personal mobile devices Other high-end embedded applications

16

Page 17: Dme presentation-dec2012-rev13-1

Video and Data Packet processors are drivers for faster memory access today

- Graphics- Mobile Video- Network Processor- FPGA

David McCann GF Snr Dir

Page 18: Dme presentation-dec2012-rev13-1

SSD

Page 19: Dme presentation-dec2012-rev13-1

Memory

Page 20: Dme presentation-dec2012-rev13-1

Portable

Page 21: Dme presentation-dec2012-rev13-1

The DME provides

Programmability => the DME can be optimized for any particular application. Lower design risk, allowing late design changes without need for re-spin Customization => different hardware versions can be generated for different platform instances. Dynamic programmability => facilitates use of customized functions in different parts or phases of an application. Efficiency => speed and power on par with custom hardware Separation => offloading computing cores, giving higher degree of parallelism. The DME complies to several standard interfaces, e.g. AHB, APB and AXI, with configurable data bus widths.

Page 22: Dme presentation-dec2012-rev13-1

DME Features

Note: Perceived value is based on early customer input, and is application dependent.

Page 23: Dme presentation-dec2012-rev13-1

DME Products

Page 24: Dme presentation-dec2012-rev13-1

24

The DME architecture

Page 25: Dme presentation-dec2012-rev13-1

25

Application example: SSD NodeApplication example: SSD Node

Interface PCI-eInterface PCI-eCPU for flash write-read-remove scheduling and buffer CPU for flash write-read-remove scheduling and buffer managementmanagementPower budget for the SSD board is 13 W, for MCU is 5w.Power budget for the SSD board is 13 W, for MCU is 5w.

Page 26: Dme presentation-dec2012-rev13-1

26

Application example: SSD NodeApplication example: SSD Node

DME ? DME ?

DME ?

The CPU needs complex functionality and perhaps an OS. The DME is not a good candidate to replace the CPU

Depending on the precise functionality, the DME could be optimized for buffer management.

The DME could implement the FTL (Flash Translation Layer)

Page 27: Dme presentation-dec2012-rev13-1

27

Star-ring topology instead of treeStar-ring topology instead of treeFrom the rack perspective, it is a star topologyFrom the rack perspective, it is a star topologyIntra-cluster and inter-cluster nodes are rings.Intra-cluster and inter-cluster nodes are rings.

Application example 2: SSD Array DesignApplication example 2: SSD Array Design

Page 28: Dme presentation-dec2012-rev13-1

28

DME + ELSIP in-house switch can be optimized for DME + ELSIP in-house switch can be optimized for managing large SSD Arraysmanaging large SSD Arrays

Application example 2: SSD Array DesignApplication example 2: SSD Array Design

DME + Switch

DME + Switch

DME + Switch

DME + Switch

DME + Switch

Page 29: Dme presentation-dec2012-rev13-1

Evaluating the DME

For evaluation of the DME, Elsip offers:

Introduction Booklet DME Application Development Package, with API libraries C++ Model Compiled IP Model User manual Demonstrator On-site and off-site support

Page 30: Dme presentation-dec2012-rev13-1

The founders

Axel Jantsch, CTO. Professor, KTH Electronic Systems since 2002. 20+ years of research, primarily within NoC and SoC. 200+ scientific papers published. Visiting professor of Fudan University in PRC and Cantabria University in Spain

Ahmed Hemani. Professor, KTH, focus on high-level system integration, design automation, NoC, asynchronous circuit, configurable system. Industrial experience from NSC, NXP/Philips, ABB, Ericsson, Newlogic, Synthesia and Spirea (co-founder).

Zhonghai Lu: Professor, KTH, expert in SoC and NoC. Reviewer of 14 international periodicals. Principal investigator of Intel, dealing with future nuclear processor chip frame.

30

Page 31: Dme presentation-dec2012-rev13-1

Management Team

Adam Edström, CEO. 20+ years as editor and editor-in-chief of Elektroniktidningen, Sweden's major electronics publication. Visiting editor at Fortune Magazine in NYC. VP International affairs at SICS, Swedish Institute of Computer Science. Founded three companies prior to Elsip.

Bengt Edlund, Sales Director. 30+ years of semiconductor sales, marketing and new technology business development at National Semiconductor and Hewlett Packard. Served as European director of business development, marketing and global sales.

31

Page 32: Dme presentation-dec2012-rev13-1

Some ELSIP Milestones

•Founded by professors Axel Jantsch, Ahmed Hemani and Zhonghai Lu at the Royal Institute of Technology in Stockholm 2011•Received initial funding from Vinnova•Commercial launch when Adam Edström (CEO) and Bengt Edlund (Sales Dir) joined the company Sept 2012•Established subsidiary Memcom in Shanghai March 2012, PRC, with Zhonghai Lu as CTO and Zhuo Zou as CEO. Received initial funding from Wuxi government.•Cooperation with Fudan-Wuxi Institute, Shanghai, PRC•Selected by SICS, the Swedish Institute of Computer Science, as member of SICS Startup Accelerator

32

Page 33: Dme presentation-dec2012-rev13-1

Roadmap

Looking into the future, other IP we’re working on include:

Circuit-switched NoC (faster than today’s NoC for telecom/datacom applications)

CGRA - Coarse Grain Reconfigurable Architecture (reconfigurable on bus level, better silicon usage than FPGA)

Page 34: Dme presentation-dec2012-rev13-1

Contact

Sales Director Bengt EdlundMail: [email protected]: +46 708 722 800

CEO Adam EdströmMail: [email protected] +46 702 579 734

Address: c/o SICS, PO Box 1263, SE16429, Kista, Sweden

34

Page 35: Dme presentation-dec2012-rev13-1

Thank you!