Scalable Directory Protocols for 1000s of Cores
description
Transcript of Scalable Directory Protocols for 1000s of Cores
Scalable Directory Protocols for 1000s of CoresDominic DiTomasoEE 6633
Outline
• Introduction•Background
• ATAC• SPATL• Cuckoo Directory
•SCD•Conclusions
Directory Protocols• Snoopy (broadcast) -> Directory (multicast)
• Large Directory Overhead• Overhead = P*M• P - # of processors• M - # of memory blocks
• 64 nodes: 12.7% overhead• 256 nodes: 50% overhead• 1024 nodes: 200% overhead
P
M
Directory Protocols• Requirements
• Small area, energy, and latency overheads• Accurate sharer information• Limited directory-induced invalidations
• Duplicate Tags• Area-efficient• High associativity -> high power
• Sparse Directory• Power-efficient• Large capacity -> large area
• Coarse-grain vectors, Hierarchical, etc.
ATAC• Optical Broadcast Network
SPATL• Tagless• Bloom Filters
Cuckoo Directory• N-ary Cuckoo Hash Table
SCD• Variable directory tags
SCD (cont.)
SCD (cont.)
Conclusions• Large directory overhead at 1000s of cores• Solutions
• Optics – ATAC• Tagless – SPATL• Hash Tables – Cuckoo• Variable Tags – SCD
References• [1] George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu,
Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal, “ATAC: a 1000-core cache-coherent processor with on-chip optical network,” In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10), 2010.
• [2] Daniel Sanchez and Christos Kozyrakis, “SCD: A scalable coherence directory with flexible sharer set encoding,” In Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA '12), 2012.
• [3] H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan, “SPATL: Honey, I Shrunk the Coherence Directory,” In Proceedings of the 20th international conference on Parallel architectures and compilation techniques (PACT ’11), 2011.
• [4] M. Ferdman, P. Lotfi-Kamran, K. Balet, B. Falsafi, "Cuckoo directory: A scalable directory for many-core systems," 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp.169-180, Feb. 2011.