Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University...

56
Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems (Net-Centric IUCRC) Computer Science and Engineering The University of North Texas Denton, Texas 76203, USA [email protected] http://csrl.unt.edu/~kavi

Transcript of Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University...

Page 1: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory OptimizationsResearch at UNT

Krishna Kavi

ProfessorDirector of NSF Industry/University Cooperative Center

for Net-Centric Software and Systems (Net-Centric IUCRC)

Computer Science and EngineeringThe University of North Texas

Denton, Texas 76203, USA

[email protected]://csrl.unt.edu/~kavi

Page 2: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 2

Motivation

Memory subsystem plays a key role in achieving performance on multi-core processors

Memory subsystem contributes to significant portions of energy consumed

Pin limitations limit bandwidth to off-chip memories

Shared caches may have non-uniform access behaviors

Shared caches may encounter inter-core conflicts and coherency misses

Different data types exhibit different locality and reuse behaviors

Different applications need different memory optimizations

Page 3: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 3

Our Research Focus

Cache Memory optimizations

software and hardware solutions

primarily at L-1

some ideas at L-2

Memory Management

Intelligent allocation and user defined layouts

Hardware supported allocation and garbage collection

Page 4: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 4

Non-Uniformity of Cache Accesses

Non-Uniform access to cache setsSome sets are accessed 100,000 time more often than other setsCause more misses while some sets are not used

Non-Uniform Cache Accesses For Parser

Page 5: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 5

Non-Uniformity of Cache Accesses

But, not all applications exhibit “bad” access behavior

Non-Uniform Cache Accesses for Selected BenchmarksNeed different solutions for different applications

Page 6: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 6

Improving Uniformity of Cache Accesses

Possible solutions

• Using Fully associative caches with perfect replacement policies

• Selecting optimal addressing schemes

• Dynamically re-mapping addresses to new cache lines

• Partitioning caches into smaller portions

• Each partition used by a different data object

• Using Multiple address decoders

• Static or dynamic data mapping and relocation

Page 7: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 7

Associative Caches Improve Uniformity

Direct Mapped Cache 16-Way Associative Cache

Page 8: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 8

Data Memory Characteristics

• Different Object Types exhibit different access behaviors- Arrays exhibit spatial localities- Linked lists and pointer data types are difficult to pre-fetch- Static and scalars may exhibit temporal localities

• Custom memory allocators and custom run-time support can be used to improve locality of dynamically allocated objects- Pool Allocators (U of Illinois)- Regular Expressions to improve on Pool Allocators (Korea)- Profiling and reallocating objects (UNT)- Hardware support for intelligent memory management (UNT and Iowa State)

Page 9: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 9

ABC’s of Cache Memories

Multiple levels of memory – memory hierarchy

CPU and Registers

L1- InstrCache

L1- DataCache

L2 Cache(combinedData and Instr)

DRAM(Main memory)

DISK

Page 10: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 10

ABC’s of Cache Memories

Consider a direct mapped Cache

An address can only be in a fixed cache line as specified by the 6-bit line number of the address

Page 11: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 11

ABC’s of Cache Memories

Consider a 2-way set associative cache

An address is located in a fixed set of the cache.But the address can occupy either of the 2 lines of a set.

We extend this idea to 4-way, 8-way,.. fully associative caches

Page 12: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 12

ABC’s of Cache Memories

Consider a fully associative cache

An address is located in any line

Or, there is only one set in the cache.

Very expensive since we need to compare the address tag with each line tag.

Also need a good replacement strategy.

Can lead to more uniform of access to cache lines

Tag Byte offset

Page 13: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 13

Programmable Associativity

Can we provide higher associativity only when we need it?Consider a simple idea

Heavily accessed cache lines will be provided with alternate locationsas indicated by “partner index”

Page 14: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 14

Programmable Associativity

Pier’s adaptive cache uses two tablesSet-reference History Table (SHT) – tracks heavily used cache lines Out-of-position directory (OUT) – tracks alternate locations

[Pier 98] J. Peir, Y. Lee, and W. Hsu, “Capturing Dynamic Memory Reference Behavior with Adaptive Cache Topology.” In Proc. of the 8th Int. Conf. on Architectural Support for Programming Language and Operating Systems, 1998, pp. 240–250

[Zhang 06] C. Zhang. Balanced cache: Reducing conflict misses of direct-mapped caches. ISCA, pages 155–166, June 2006

Zhang’s programmable associativity (B-Cache)Cache index is divided in to Programmable and Non-programmable indexesThe NPI facilitates for varying associativities

Page 15: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 15

Programmable Associativity

Page 16: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 16

Programmable Associativity

adpcmbasicmath

bitcountcrc

dijkstrafft

patriciaqsort

rijndaelsha

susanAverage

0

20

40

60

80

100

120

Adaptive_Cache B_Cache Column_associative

Mibench Benchamarks

% R

ed

uc

tio

n i

n M

iss

-Ra

te

Page 17: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 17

Programmable Associativity

adpcmbasicmath

bitcountcrc

dijkstrafft

patriciaqsort

rijndaelsha

susanAverage-5

0

5

10

15

20

25

30

35

40

45

Adaptive_Cache B_Cache Column_associative

Mibench Benchmarks

% R

eductio

n in

AM

AT

Page 18: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 18

Multiple Decoders

Tag Set Index Byte offsetTag

Tag Set Index Byte offset

TagSet Index Byte offsetSet Index

Tag Data

Different decoders may use different associativities

Page 19: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 19

Multiple Decoders

But how to select index bits?

Page 20: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 20

Index Selection Techniques

Different approaches have been studiedGivargis quality bitsX-Or some tag bits with index bitsAdd a multiple of tag to index Use prime modulo

[Givargis 03] T. Givargis, “Improved Indexing for Cache Miss Reduction in Embedded Systems,” In Proc. of Design Automation Conference, 2003.

[Kharbutli 04] M. Kharbutli, K. Irwin, Y. Solihin, and J. Lee, “Using PrimeNumbers for Cache Indexing to Eliminate Conflict Misses,” Proc.Int’l Symp. High Performance Computer Architecture, 2004

Page 21: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 21

Index Selection Techniques

Page 22: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 22

Multiple Decoders

Odd multiplier method

Different multipliers foreach thread

b itc o un t_ a dp c m

b zi p2 _ lib q ua n tum

ff t_ su s an

g ro ma cs _ nam

d

mi lc _n am

d

q so r t_ ba s ic mat h

q so r t_ pa tr ic ia

ff t_ ba s ic ma th _p a tri ci a_ s us a n

s us a n_ b itc o un t_ a dp c m_ pa tr ic ia

Av er ag e

0

20

40

60

80

Multi-Threaded Benchmarks

% R

ed

uct

ion

in M

iss-

Ra

te

Page 23: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 23

Multiple Decoders

bit cou nt_a dpc m

fft_ sus an

qs ort_ bas icmath

qs ort_ fft

qs ort_ patr icia

lib qua ntum_m

ilc

milc_n am

d

gr oma cs_ nam

d

bz ip2_ libq uan tum

fft_ bas icmath _pat rici a_su san

su san _bit coun t_a dpc m_p atri cia

Av era ge

0

10

20

30

40

50

60

70

Multi-threaded Applications

% Im

prov

emen

t in

AM

AT

Here we split cache into segments, one per thread

But, we used Adaptive cachetechniques to “donate”underutilized sets to other threads

Page 24: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 24

Other Cache Memory Research at UNT

Use of a single data cache can lead to unnecessary cache misses

Arrays exhibit higher spatial localities while scalar may exhibit higher temporal localities

May benefit from different cache organizations (associativity, block size)

If using separate instruction and data caches, why not different data caches -- either statically or dynamically partitioned

And if separate array and scalar caches are included, how to further improve their performance

Optimize the sizes of array and scalar caches for each application

Page 25: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 25

Reconfigurable Caches

CPU

ArrayCache

ScalarCache

MAIN

MEMORY

SecondaryCache

Cache

Page 26: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 26

Percentage reduction of power, area and cycles for data cache

0

10

20

30

40

50

60

70

80

90

power

area

time

bc qs dj bf sh ri ss ad cr ff avg

percentage

Conventional cache configuration: 8k, Direct mapped data cache, 32k 4-way Unified level 2 cache Scalar cache configuration: Size variable, Direct mapped with 2 lined Victim cache Array cache configuration: Size variable, Direct mapped

Page 27: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 27

Summarizing

For instruction cache85% (average 62%) reduction in cache size72% (average 37%) reduction in cache access time75% (average 47%) reduction in energy consumption

For data cache78% (average 49%) reduction in cache size36% (average 21%) reduction in cache access time 67% (average 52%) reduction in energy consumption

when compared with an 8KB L-1 instruction cache and an 8KB L-1 unified data cache with a 32KB level-2 cache

Page 28: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 28

Generalization

Why not extend Array/Scalar split caches to more than 2 partitions?

Each partition customized to a specific object type

Partitioning can be achieved using multiple decoders with a single cache resource (virtual partitioning)

Reconfigurable partitions is possible with programmable decodersEach decoder accesses a portion of the cache

either physically restrict to a segment of cacheor virtually limit the number of lines accessed by a

decoder

Scratchpad Memories can be viewed as cache partitionsDedicate a segment of cache for scratchpad

Page 29: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 29

Scratch Pad Memories

They are viewed as compiler controlled memoriesas fast as L-1 caches, but not managed as caches

Compiler decides which data will reside in scratch pad memory

A new paper from Maryland proposes a way of compiling programs for unknown sized Scratch pad memories

Only Stack data (static and global variables) are placed in SPMCompiler views Stack as two stacks

Potential SPM data stackDRAM data stack

Page 30: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 30

Current and Future Research

Extensive study of using Multiple Decoders

Separate decoders for different data structurespartitioning of L-1 caches

Separate decoders for different threads and coresat L-2 or Last Level Cachesminimize conflictsminimize coherency related missesminimize loss due to non-uniform memory access delays

Investigate additional indexing or programmable associativity ideasCooperative L-2 caches using adaptive caches

Page 31: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 31

Program Analysis Tool

We need tools to profile and analyze • Data layout at various levels of memory hierarchy• Data access patterns

• Existing tools (Valgrind, Pin) do not provide fine grained information• We want to relate each memory access back to a source level constructs

Source variable name, function/thread that caused the access

Page 32: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 32

Gleipnir

Our tool is built on top of Valgrind

Can be used with any architecture that is supported by Valgrind

x-86, PPC, MIPS

and ARM

Page 33: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 33

Gleipnir

Page 34: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 34

Gleipnir

Page 35: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 35

Gleipnir

Page 36: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 36

Gleipnir

How can we use Gleipnir.Explore different data layoutsand their impact on cache accesses

Page 37: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 37

Gleipnir

Standard layout

Page 38: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 38

Gleipnir

Tiled matrices

Page 39: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 39

Gleipnir

Matrices A and C combined

Page 40: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 40

Further Research

• Restructuring memory allocation – currently in progress- Analyze cache set conflicts and relate them to data objects- Modify data placement of these objects- Reorder variables, include dummy variables, …

• Restructure Code to improve data access patters (SLO tool)- Loop Fusion – combine loops that use the same data- Loop tiling – split loops into smaller loops to limit data accessed- Similar techniques to assure “common” data resides in L-2 (shared

caches)- Similar techniques such that data is transferred to GPUs infrequently

Page 41: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Loop Tiling Idea

Too much data accessed in the loop

Memory Optimizations at UNT 41

Code Refactoring

double sum(…) {…for(int i=0; i<len; i++)

result += X[i];…

}

all cache misses occur here.

Page 42: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 42

Code Refactoring

Loop Fusion Idea

double inproduct(…) { …

for(int i=0; i<len; i++)result += X[i]*Y[i];

…}

double sum(…) {…for(int i=0; i<len; i++)

result += X[i];…

}

previous use occur here.

all cache misses occur here.

Page 43: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 43

SLO Tool

double inproduct(…) { …

for(int i=0; i<len; i++)result += X[i]*Y[i];

…}double sum(…) {

…for(int i=0; i<len; i++)

result += X[i];…

}

Page 44: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 44

Extensions Planned

Key Factors Influencing Code and Data Refactoring

Reuse Distance – reducing distance improves data utilization

Can be used with CPU-GPU configurations

Fuse loops so that all computations using the “same” data are grouped

Conflict sets and conflict distances

The set of variables that fall to the same cache line (or group of lines)

Conflict between pairs of conflicting variables

Increase conflict distance

Page 45: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 45

Further Research

We are currently investigating several of these ideas

Using architectural simulators like SimICS

explore multiple decoders with multiple threads, cores or for different data types

Further extend Gleipnir

and explore using Gleipnir with compilers

and Gleipnir with other tools like SLO,

evaluate the effectiveness of custom allocators

Some hardware implementations of memory management using FPGAs

And we welcome collaborations

Page 46: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 46

The End

Questions?

More information and papers at http://csrl.cse.unt.edu/~kavi

Page 47: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 47

Custom Memory Allocators

Consider a typical pointer chasing programs

node {int key;… data; /* complex data partnode *next; }

We will explore two possibilitiespool allocationsplit structures

Page 48: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 48

Custom Memory Allocators

• Pool Allocator (Illinois)

Data type B

Data type A Data type AData type A Data type A

Data type B Data type BData type B

Data type A

Data type AData type A

Data type A

Data type B

Data type B

Heap

Heap

Page 49: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 49

Custom Memory Allocators

Further OptimizationConsider a typical pointer chasing programs

node {int key;… data; /* complex data partnode *next; }

The data part is accessed only if key matches

while (..) {if (b->key == k) return h->data;h= h->next;}

Consider a different definition of the data

node { int key; node *next; data_node * data+ptr; }

Key; *next;

*datat_ptr

Key; *next;

*datat_ptr

Key; *next;

*datat_ptr

Data_node Data_node Data_node

Page 50: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 50

Custom Memory Allocators

Profiling (UNT) Using data profiling, “flatten” dynamic data into consecutive blocksMake linked lists look like arrays!

Page 51: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 51

Cache Based Side-Channel Attacks

Encryption algorithms use keys (or blocks of the key) as index into tables containing constants used in the algorithm

Using which table entries caused cache missescan find the address of the table entry

and then find the value of the key that was used

Z. Wang and R. Lee. “New cache designs for thwarting software cache based side channel attacks”, ISCA 2007, pp 494-505

Two solutions: 1. Lock cache lines (cannot be displaced) when using encryption

2. Use a random replacement policy in selecting which line of a set is replaced

Page 52: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 52

Offloading Memory Management Functions

1. Dynamic memory management is the management of main memory

for use by programs during runtime

2. Dynamic memory management account for significant amount of

execution time –42& for 197.parser (from SPEC 2000 benchmarks)

3. If CPU is performing memory management, CPU cache will perform

poorly due to switching between user functions and memory

management functions

4. If we have a separate hardware and separate cache for memory

management, CPU cache performance can be improved dramatically

Page 53: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 53

Offloading Memory Management Functions

BIUCPU

DataCache

1

2

3

De-All Completion

Allocation Ready

System

Bus

InstructionCache

Interface

Mem

ory

Pro

cess

or

MPInst. Cache

MPData Cache

Seco

nd L

evel

Cac

he

Page 54: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 54

Improved Performance

• Object Oriented and Linked Data Structured Applications Exhibit Poor LocalityCache pollution caused by Memory Management functions

• Memory management functions do not use user data cachesOn average, about 40% of cache misses eliminated

• Memory manager does not need large data caches

Page 55: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 55

Improved Execution Performance

Nameof

Benchmark

% of cycles

spent on malloc

Numbers of instructions in conventional Architecture

Numbers of instruction in

Separated Hardware

Implementation

% Performance increase due to

Separate Hardware

Implementation

% Performance increase due to fastest separate

Hardware Implementation

255.vortex 0.59 13020462240 12983022203 2.81 2.90

164.gzip 0.04 4,540,660 4,539,765 0.031 0.0346197.parser 17.37 2070861403 1616890742 3.19 18.8

espresso

Cfrac 31.17 599365 364679 19.03 39.99bisort 2.08 620560644 607122284 10.03 12.76

Page 56: Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.

Memory Optimizations at UNT 56

Other Uses of Hardware Memory Manager

Dynamic relocation of objects to improve localities

Hardware Manager can track object usage and relocate them

without CPU’s knowledge

New and innovative Allocation/Garbage collection methods

Estranged Buddy Allocator

Contaminated Garbage Collector

Predictive allocation to achieve “one-cycle” allocation

Allocator bookkeeping data kept separate from objects