INSTITUTE OF COMPUTING TECHNOLOGY Exploiting the Produce-Consume Relationship in DMA to Improve I/O...
-
Upload
lambert-dean -
Category
Documents
-
view
213 -
download
0
Transcript of INSTITUTE OF COMPUTING TECHNOLOGY Exploiting the Produce-Consume Relationship in DMA to Improve I/O...
INS
TIT
UTE O
F C
OM
PU
TIN
G
TEC
HN
OLO
GY
Exploiting the Produce-Consume Relationship in DMA to Improve I/O
Performance
Dan Tang, Yungang Bao,
Yunji Chen, Weiwu Hu, Mingyu ChenInstitute of Computing Technology,
Chinese Academy of Sciences
2009.2.15
Workshop on The Influence of I/O on Microprocessor Architecture (IOM-2009)
INSTITUTE OF COMPUTING
TECHNOLOGY
An Brief Intro Of ICT, CAS
ICT has built the Fastest HPC in China – Dawning 5000, which is 233.5TFlops and rank 10th in Top500.
ICT has developed the Loongson CPU
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
Importance of I/O operations
I/O are ubiquitous Load binary files : Disk Memory Brower web, media stream : NetworkMemory…
I/O are important Many commercial applications are I/O intensive:
Database, Internet applications etc.
INSTITUTE OF COMPUTING
TECHNOLOGY
State-of-the-Art I/O Technologies
I/O Bues: 20GB/s PCI-Express 2.0 HyperTransport 3.0 QuickPath Interconnect
I/O Devices RAID: 400MB/s 10GE: 1.25GB/s
INSTITUTE OF COMPUTING
TECHNOLOGY
A Typical Computer Architecture
NIC
INSTITUTE OF COMPUTING
TECHNOLOGY
Direct Memory Access (DMA)
DMA is an essential feature of I/O operation in all modern computers
DMA allows I/O subsystems to access system memory for reading and/or writing independently of CPU.
Many I/O devices use DMA Including disk drive controllers, graphics
cards, network cards, sound cards and GPUs
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA in Computer Architecture
NIC
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Engine
CPU
Memory
Driver Buffer
Descriptor①
②③
Kernel Buffer
④
User Buffer
⑤
An Example of Disk Read:DMA Receiving Operation
• Cache Access Latency : ~20 Cycles• Memory Access Latency : ~200 Cycles
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Engine
CPU
Memory
Driver Buffer
Descriptor①
②③
Kernel Buffer
④
User Buffer
⑤
Potential Improvement of DMA
• This is a typical Shared-Cache Scheme
INSTITUTE OF COMPUTING
TECHNOLOGY
Problems of Shared-Cache Scheme
Cache Pollution Cache Thrashing
Degrade performance when DMA requests are large (>100KB) for “Oracle + TPC-H” application
INSTITUTE OF COMPUTING
TECHNOLOGY
Rethink DMA Mechanism
The Nature of DMA There is a producer-consumer relationship between CPU and
DMA engine Memory plays a role of transient place for I/O data transferred
between processor and I/O device
Corollaries Once I/O data is produced, it will be consumed I/O data within DMA buffer will be used only once in most cases
(i.e. almost no reuse). Characterizations of I/O data are different from CPU data It may not be appropriate to store I/O data and CPU data
together
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Proposal
A Dedicated Cache
Storing I/O data
Capable of exchanging data with processor’s last level cache (LLC)
Reduce overhead of I/O data movement
DMA
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Design Issues
Cache Coherence Data Path Replacement Policy Write Policy Prefetching
CP
U C
ache S
tate Diagram
DM
A C
ache S
tate Diagram
DMA Cache State Diagram is similar to CPU Cache in Uniprocessor system
We are researching multiprocessor platform…
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Design Issues
Cache Coherence Data Path Replacement Policy Write Policy Prefetching
DMA
Additional data paths and data access ports for LLC are not required because data migration operations between DMA cache and LLC can share existing data paths and ports of snooping mechanism
INSTITUTE OF COMPUTING
TECHNOLOGY
Data Path: CPU Read
Cache Ctrl
Snoop Ctrl
Last Level Cache
Mem Ctrl
Memory
CPU read
cmddata
Hit in DMA cache?
System Bus
DMA Ctrl
I/O Device
Cache Ctrl
Snoop Ctrl
DMACache
Miss in LLC &
Hit in DMA Cache
INSTITUTE OF COMPUTING
TECHNOLOGY
Cache Ctrl
Snoop Ctrl
Last Level Cache
Mem Ctrl
Memory
cmddata
Hit in LLC?
System Bus
DMA Ctrl
I/O Device
Cache Ctrl
Snoop Ctrl
DMACache
Miss in DMA Cache
& Hit in LLC
Data Path: DMA Read
DMA read
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Design Issues
Cache Coherence Data Path Replacement Policy Write Policy Prefetching
An LRU-like Replace Policy
1. Invalid Block
2. Clean Block
3. Dirty Block
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Design Issue
Cache Coherence Data Path Replacement Policy Write Policy Prefetching
Adopt Write-Allocate Policy Both Write-Back or Write Through
policies are available
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Cache Design Issue
Cache Coherence Data Path Replacement Policy Write Policy Prefetching
Adopt straightforward sequential prefetching Prefetching trigged by cache miss Fetch 4 blocks one time
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
Memory Trace Collection
Hyper Memory Trace Tool (HMTT) Capable of Collecting all memory requests Provide APIs for injecting tags into memory trace
to identify high-level system operations
INSTITUTE OF COMPUTING
TECHNOLOGY
FPGA Emulation L2 Cache from Godson-2F DDR2 Memory Controller from Godson-2F DDR2 DIM model from Micron Technology Xtreme system from Cadence
L2 CacheL2 Cache
MemCtrlMemCtrl
DDR2 DramDDR2 Dram
DMA CacheDMA
Cache
Memory trace
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
Experimental Setup
Machine AMD Opteron 2GB Memory 1 GE NIC IDE disk
Benchmark File Copy TPC-H SPECWeb2005
Configurations Snoop Cache (2MB) Shared Cache (2MB) DMA Cache
256KB + prefetch 256KB w/o prefetch 128KB + prefetch 128KB w/o prefetch 64KB + prefetch 64KB w/o prefetch 32KB + prefetch 32KB w/o prefetch
INSTITUTE OF COMPUTING
TECHNOLOGY
Characterization of DMA
The portions of DMA memory reference varies depending on applications
The sizes of DMA requests varies depending on application
INSTITUTE OF COMPUTING
TECHNOLOGY
Normalized Speedup
Baseline is snoop cache scheme DMA cache schemes exhibits better performance than others
INSTITUTE OF COMPUTING
TECHNOLOGY
DMA Write & CPU Read Hit Rate
Both shared cache and DMA cache exhibit high hit rates Then, where do cycle go for shared cache scheme?
INSTITUTE OF COMPUTING
TECHNOLOGY
Breakdown of Normalized Total Cycles
INSTITUTE OF COMPUTING
TECHNOLOGY
% of DMA Writes causing Dirty Block Replacement
Those DMA writes cause cache pollution and thrashing problem The 256KB DMA cache is able to significantly eliminate these
phenomena
INSTITUTE OF COMPUTING
TECHNOLOGY
% of Valid Prefetched Blocks
DMA caches can exhibit an impressive high prefetching accuracy This is because I/O data has very regular access pattern.
INSTITUTE OF COMPUTING
TECHNOLOGY
Overview
Background Nature of DMA Mechanism DMA Cache Scheme Research Methodology Evaluations Conclusions and Ongoing Work
INSTITUTE OF COMPUTING
TECHNOLOGY
Conclusions and Ongoing Work The Nature of DMA
There is a producer-consumer relationship between CPU and DMA engine Memory plays a role of transient place for I/O data transferred between
processor and I/O device
We propose a DMA cache scheme and its design issues.
Experimental results show that DMA cache can significantly improve I/O performance.
Ongoing Work The impact of multiprocessor, multiple DMA channels for DMA cache In theory, a shared cache with an intelligent replacement policy can achieve the
effect of DMA cache scheme. Godson-3 has integrated an dedicate cache management policy for I/O data.
INSTITUTE OF COMPUTING
TECHNOLOGY
THANKS !Q&A?