Why the Address Translation Scheme Matters?

17
Why the Address Translation Scheme Matters? Jiaqing Du

Transcript of Why the Address Translation Scheme Matters?

Page 1: Why the Address Translation Scheme Matters?

Why the Address Translation Scheme Matters?

Jiaqing Du

Page 2: Why the Address Translation Scheme Matters?

Address Translation/Mapping

• Where is 0x1f344000?• DRAM Devices– A multi-dimensional array– Inside a DIMM: Rank, Bank, Row, Column– Among DIMMs: Memory Controller, Channel

Page 3: Why the Address Translation Scheme Matters?

Inside Memory Controller

• Accesses to Different Parts == High Parallelism == High Throughput

• In favor of locality property – Logically adjacent means physically distant

Page 4: Why the Address Translation Scheme Matters?

Agenda

• A Scalable Software Router• Performance of A Commodity Server• Memory Translation Disclosure– Experiment Design– Experiment Result

• Understanding the Imbalance• Possible Solutions• Conclusion

Page 5: Why the Address Translation Scheme Matters?

A Scalable Software Router

• A Valiant Load-Balanced Mesh• Aggregated Throughput: N x R (bps)

1 2

3N

… 4

R

R

R

R

2R/N R

R

Page 6: Why the Address Translation Scheme Matters?

Performance of A Commodity PC

• Experiment Environment– 2 Xeon 1.6GHz sockets, 4 cores/socket– Each of 2 cores share a 8MB L2 cache– 1 GHz FSB, 8GB DDR2 667MHz– 2 MCs manage 4 channels– 4 quad-port 1Gbps NICs (16 ports)– Click 1.6.0 on Linux 2.6.19

• Simple “Point-to-Point” Forwarding• A Chipset Monitoring Tool (Emon)

Page 7: Why the Address Translation Scheme Matters?

Performance of A Commodity PC

• Maximum Loss-free Forwarding Rate– 16Gbps input

Page 8: Why the Address Translation Scheme Matters?

Performance of A Commodity PC

• Memory Load Distribution

• My work is to dig further– Explain the imbalance– But, we don’t know how an address is mapped :(

stream benchmark 1024B 64B

Page 9: Why the Address Translation Scheme Matters?

Disclose Address Translation

• What We Want?– Which bit selects channel, rank, bank, and …– What parallelism really gives us

• What We Have?– Emon: tells us throughput and load distribution

• What We Need?– Enough traffic to one single memory location– Enough traffic to two memory locations,

e.g., 0x1f344000 and 0x1f34100

Page 10: Why the Address Translation Scheme Matters?

Disclose Address Translation

• Artificial Memory Access Pattern– One writing flow to ADDR1– One flow to ADDR1

The other to ADDR1+2^b ( b = 0, 1, …, 31)• Utilize the Cache– Cache coherency protocol (MESI)– Bind two threads to two cores don’t share L2,

Force them to keep writing to one location– Write to an invalid cache line goes directly to memory– Two threads generate one writing flow

Page 11: Why the Address Translation Scheme Matters?

Disclose Address Translation

• Experiment Result– ADDR1, ADDR1+2^b

Page 12: Why the Address Translation Scheme Matters?

Understand the Imbalance

• Memory Management– Pre-allocated 2KB socket buffer– Reclaimed & reallocated by the kernel

• A Limited number of buffers serve all packets.• A 2KB buffer spans the entire rank-bank grid.• Large Packets (1024B)– Cover at least half of the grid (high parallelism)

• Small Packets (64B)– Hit some elements W.H.P. (poor parallelism)

Page 13: Why the Address Translation Scheme Matters?

• In real world, even worse.

Understand the Imbalance

0 1 2 3 4 5 6 7

1 2 3 4 5 6 7

0 1 2 3

4 5 6 70

0 1 2 3 4 5 6 7

1 2 3 4 5 6 7

0 1 2 3

4 5 6 70

Memory Pool

Mapped Grid

1024B

64B

Page 14: Why the Address Translation Scheme Matters?

What Can We Do?

• Hack Network Adaptor Driver– Introduce random offset

0 1 2 3 4 5 6 7

1 2 3 4 5 6 7

0 1 2 3

4 5 6 70

Memory Pool

Mapped Grid

Page 15: Why the Address Translation Scheme Matters?

What Can We Do?

• Hack Slab Allocator and kmalloc()– Maintain a special slab– Provide access through kmalloc()

0 1 2 3 4 5 6 7

4 5 6 7 0 1 2

0 1 2 3

4 5 6 73

Memory Pool

Mapped Grid

Page 16: Why the Address Translation Scheme Matters?

What Can We Do?

• Maintain buffers with various sizes– NIC supports multiple descriptor rings– A hardware feature

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6

0 1 2 3

4 5 6 77

Memory Pool

Mapped Grid

Page 17: Why the Address Translation Scheme Matters?

Conclusion

• Figured out memory translation scheme • Explained memory load imbalance• Proposed two possible solutions