Post on 19-Jul-2020
Lecture 19: Parallel SortingAbhinav Bhatele, Department of Computer Science
High Performance Computing Systems (CMSC714)
Abhinav Bhatele, CMSC714
Summary of last lecture
• I/O can become a bottleneck when other portions of the code scale well
• Reading input datasets, writing numerical/scientific output, checkpointing
• Parallel file system required for high performance
• Different approaches
• One process per file, shared file, shared files for subsets of processes
• Contention for metadata server and OSTs/disks
2
Abhinav Bhatele, CMSC714
Parallel Sorting
• Sorting is used in many HPC codes
• For example, figuring out which particles/atoms are within a cutoff radius
• Two broad categories of parallel sorting algorithms:
• Merge-based
• Splitter-based
3
Abhinav Bhatele, CMSC714
Review Bitonic Sort
• Merge-based algorithm: sort by merging bitonic sequences
• Bitonic sequence: increases monotonically then decreases monotonically
• At each step, merge a bitonic sequence
4
9 6 1 5 14 7 15 11 2 12 13 4 16 8 3 10
6 9 5 1 7 14 15 11 2 12 13 4 8 16 10 3
1 5 6 9 15 14 11 7 2 4 12 13 16 10 8 3
1 5 6 7 9 11 14 15 16 13 12 10 8 4 3 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Abhinav Bhatele, CMSC714
Review QuickSort
• Choose a pivot element from the unsorted list
• Move all elements < pivot before the pivot and all elements > pivot after the pivot
• Recursively apply this to the sublists before and after pivot
5
Abhinav Bhatele, CMSC714
Parallel Sample Sort
• Instead of selecting one pivot, we select p-1 samples (if there are p processors)
• This provides us with p-1 “splitters”
• These p-1 splitters create p buckets
• Keys are then sent to the appropriate bucket
• Why called sample sort? sample s keys randomly from each processor, sort sp keys and select p-1 splitters from this sorted sample
6
Abhinav Bhatele, CMSC714
Parallel Radix Sort
• Instead of comparing keys, looks at k bits of each key in every step
• k-bit radix sort looks at k bits in one step
• Move from least significant to most significant bits
• k bits leads to putting keys into 2k buckets in a step
• Parallel version:
• These buckets are assigned to p processes and key movement leads to all-to-all communication
• To balance buckets across processes: use histograms to decide assignment of buckets to processes
7
Abhinav Bhatele, CMSC714
Questions
• Can we talk about how the “plus scan” works as described in the “Scanning the Histogram” section? The paper essentially just states that it’s something that happens.
• The sending of data clearly dominates the run time of the described radix sort. Is this always the case with parallel sorting?
• What does it mean to pipeline operations?
• How do GPUs fare in these sorting schemes?
8
An Improved Supercomputer Sorting Benchmark
Abhinav Bhatele, CMSC714
Questions
• Are there commonly used libraries that implement these versions of parallel sorting algorithms? Are there other commonly used parallel sorting libraries?
• Can we go over the different sorting algorithms, the paper was confusing
• When is it important for sorting algorithms to be stable?
9
A Comparison of Sorting Algorithms for the Connection Machine CM-2
Abhinav Bhatele
5218 Brendan Iribe Center (IRB) / College Park, MD 20742
phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu
Questions?