An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc....

33
An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense

Transcript of An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc....

Page 1: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

An Efficient External Sorting Algorithm for

Flash Memory Embedded Devices

Tyler Cossentine - M.Sc. Thesis Defense

Page 2: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

2

Overview

• Introduction

• Previous work

• Flash MinSort

• Experimental Results

• Conclusions

Page 3: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

3

Introduction

• Embedded systems are devices that perform a few simple functions.

• Embedded devices typically have limited power, memory and computational resources.

• Many embedded systems applications involve storing and querying large datasets.

• Sorting algorithms are commonly used in query processing.

Page 4: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

4

Embedded Devices

• Not designed to be general purpose devices.o Wireless sensor networks, smart cards, etc.

• Can communicate with other devices through wired or wireless interfaces.

• Hardware constraints:o Battery poweredo Low-power microcontrollero Limited memory (as little as a 1kB)o Small amount of local storage (Flash or EEPROM)

Page 5: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

5

Sensor Networks

• Sensor networks are used in military, environmental, agricultural and industrial applications.

• A wireless sensor node contains a microcontroller, sensing system, local storage, battery and wireless radio.

• Devices may process data locally or send it to a common collection point (sink) for processing.

• On-device data storage and query processing has the potential to reduce communication and energy use [6][8].

Page 6: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

6

Flash Memory

• A type of EEPROMo Available in higher capacitieso Organized as pages of datao A page is erased before it is writteno Erase unit is typically a block of pages

• Two types: NOR and NANDo NOR memory supports byte-level readso NAND requires error-correcting code (ECC)

• Unique performance characteristicso Asymmetric read and write costs (10-100 times faster reads) o Low-cost random readso Memory wear

Page 7: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

7

Flash Memory

Memory Array [1]

Page 8: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

8

Flash Memory

Block Diagram [1]

Page 9: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

9

Relation

Page 10: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

10

Sorting Algorithms

• Sorting is a fundamental class of algorithms because it allows for efficient ordering of results, joins, grouping and aggregation.

• An in-place sort can be performed when the entire dataset fits into memory:o Merge sorto Quicksort

• External sorting:o Use external memory (hard disk) to sort the dataseto External merge sort is the standard in databases

Page 11: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

11

Previous Work

• The most memory efficient external sorting algorithm is one key scan [2].o Performs D+1 scans, where D is the #of distinct sort key

values.o Keeps track of:

• current is the sort key value that is being output in this scan.

• split is the next smallest sort key value encountered.• The algorithm needs an initial scan to determine the values

of current and split.

o Requires enough memory to store two sort key values.

One Key Scan

Page 12: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

12

Previous Work

• A heap sort algorithm, called FAST(1) [7], uses a binary heap of size N tuples to store the next smallest tuples encountered during a scan.o Performs T/N scans, where T is the # of tuples and N is

the number of tuples that fit into memoryo Requires enough memory to store a tupleo May be slower than one key scan if there are few distinct

sort key values, the tuple size is large or the dataset is large.

Heap Sort

Page 13: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

13

Previous Work

• The external merge sort [5] algorithm is the standard sorting algorithm used in databases.o An initial read pass constructs sorted sub lists the size of

the amount of RAM allocated to the operator.o The merge phase can consist of multiple passes.o Each pass buffers one page from each of the sub lists,

performs a merge and writes a temporary result to flash.o The algorithm requires at least three pages of memory.

External Merge Sort

Page 14: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

14

Previous Work

• External merge sort requires writing and a significant amount of memory that makes it non-executable in certain embedded applications.

• Existing sorting algorithms for datasets stored in flash memory favor reads over writes.

• Existing sorting algorithms do not take advantage of low-cost random reads.

• Performance depends on the properties of the input dataset.

• Data collected in applications such as sensor networks is often clustered spatially and temporally.

Summary

Page 15: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

15

Flash MinSort

• Flash MinSort [3] uses low-cost random reads to retrieve only required pages during a scan of the relation.

• It builds a dynamic index over the relation that stores the minimum value in each region.

• A region represents one or more pages of data.

• The algorithm maintains a current minimum value and next minimum value.

• During a pass, only pages located in a region that has a minimum value equal to the current minimum are read.

Overview

Page 16: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

16

Flash MinSort

• The algorithm keeps track of the next smallest value in a region as it is being read (nextIdx).

• After a region has been read, its minimum value in the index is updated.

• Adapts to the size of the input relation and caches pages when given additional memory.

Overview

Page 17: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

17

Flash MinSortExample

Page123456789

101112

Data1 9 9 19 9 9 99 8 9 98 8 7 76 6 6 54 4 3 22 1 2 11 1 1 12 3 4 56 7 8 99 8 9 88 9 9 9

Min198752112689

Output #1Scan Min indexFind 1 in region #1Search page #1Output tuple #1next = 9, nextIdx = 4

Output #2Output tuple #4Region Min set to 9

Output #3Find 1 in region #7Search page #7Output tuple #2next = 2, nextIdx = 4

Output #4Output tuple #4Region Min set to 2Output #5Find 1 in region #8Search page #8Output tuple #1next = ∞, nextIdx = 2

Output #6Output tuple #2next = ∞, nextIdx = 3Output #7Output tuple #3next = ∞, nextIdx = 4

Output1 (from pg. 1, tuple 1)1 (from pg. 1, tuple 4)1 (from pg. 7, tuple 2)1 (from pg. 7, tuple 4)1 (from pg. 8, tuple 1)1 (from pg. 8, tuple 2)1 (from pg. 8, tuple 3)1 (from pg. 8, tuple 4)2 (from pg. 6, tuple 4)2 (from pg. 7, tuple 1)2 (from pg. 7, tuple 3)

..

..

9x

xx ∞

2

IndexDataset

Page Buffer

1 9 9 12 1 2 11 1 1 1

Page 18: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

18

Flash MinSort

• In the ideal case, each region represents a single page.

• The amount of memory required to store the minimum value of each page is LK * P, where LK is the size of the sort key and P is the number of pages.

• If there is not enough memory, each region represents two or more adjacent pages.

• The minimum amount of memory required is 4*LK for two regions.

Performance

Page 19: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

19

Flash MinSort

• If the flash chip supports direct byte reads, Flash MinSort is even more efficient as it only needs to read the sort key values.

• Performance:o P = # of pages, T = # of tuples, NP = # of pages in a

regiono DR = average # of distinct values in a region, R = # of

regionso LK = size of key in bytes, LT = size of tuple in bytes

Direct Reads

Page 20: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

20

Flash MinSort

• Considering only page reads Flash MinSort is:o Faster than one key sort in all cases.o Faster than heap sort unless input size is only a small

multiple of the memory size (e.g. 2 to 5).o Faster than external merge sort for a large spectrum of

the possible configurations even while using less memory and performing no writes.

Comparison

Algorithm Page I/Os Notes

Flash MinSort P * (1 + DR)

One Key Sort P * (1 + D) Perform scan for each distinct key

Heap Sort P * (T * LT ) / M

# scans based on # tuples

External Merge Sort(two pass)

P * (2 + X) X is write-to-read ratio as algorithm must write as an intermediate stepTwo pass is not likely for small memory sizes

Page 21: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

21

Experimental Evaluation

• Experimental evaluation compares: Flash MinSort, one key sort, heap sort, and external merge sort.

• 2kB of memory available to operators

• Sensor node hardware: o Atmel Mega644p (8 MHz) o 4KB SRAM o 2MB Atmel AT45DB161D serial flash (512 byte page size)o Node design was used for field measurement of soil moisture for use

with an automated irrigation controller [4].

• Dataset:o Three months of the live soil sensing data and generated ordered

and random data sets. The real data set has 10,000 records (160KB) and 43 distinct values.

o Record size is 16 bytes. Sort key is a 2 byte integer.

Page 22: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

22

Raw Device Performance

• Time to read 50,000 tuples: 5.3 seconds

• Time to write 50,000 tuples: 23 seconds

• Write-to-read ratio: 4.7

• Time to scan 50,000 sort keys: 2.1 seconds

• Notes:o Buffering a page in processor memory is more efficient

than using on chip buffers due to bus communication and latency.

o Bus speeds affect write-to-read ratio. Even though writing is considerably slower on the chip, this was masked due to the speed of the processor and bus.

Page 23: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Real Data

• Heap sort is not shown as time is order of magnitudes longer:o 100 bytes (5 tuple): 10,000 passes, 3,377 secondso 1200 bytes (74 tuples): 302 seconds

• MinSortDR is a direct read version of MinSort.• External merge: 1536 bytes (3 pages): 7 passes, 76 seconds

Tyler Cossentine - M.Sc. Thesis Defense

23

Page 24: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Random Data

• Data set with 10,000 records and 500 distinct values (1 to 500).

• Heap sort performs the same number of passes regardless of the data set (random, real, or ordered).

• External merge sort took 78 seconds as the sorting during initial run generation took slightly more time.Tyler Cossentine - M.Sc. Thesis Defense

24

Page 25: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Ordered Data

• Sorted, real data set with 10,000 tuples and 43 distinct values.

• MinSort did not detect sorted regions but still gets a benefit by detecting duplicates of the same value in a region.

• External merge sort took 75 seconds.Tyler Cossentine - M.Sc. Thesis Defense

25

Page 26: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

26

Results Summary

• MinSort is faster than one key sort and heap sort with or without using direct byte reads from the device.o Especially good for sensor data that exhibits temporal

clustering.o MinSort is a generalization of one key sort, and

performance of both algorithms depends on the number of distinct values.

• Heap sort is not competitive for small memory sizes.o The ratio of available RAM versus dataset size is key.

Page 27: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

27

Results Summary

• External merge sort performs well, but requires at least three pages (1,536 bytes) of memory.o For the real data set on this platform, external merge

sort will never be faster assuming at least two passes.o For wireless sensing applications, dealing with the

additional space and wear leveling complicates system design and performance.

Page 28: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

28

Solid State Drives

• Solid state drives (SSD) have sophisticated controllers that support wear leveling, address translation and buffer management.

• Test system:o AMD Operton 2.1GHzo 32GB DDR3 o Intel X25 SSD (1.6 write-to-read ratio)

• Data:o 5,000,000 tuples (80MB)o 16B tuples

Experimental Setup

Page 29: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

29

Solid State DrivesReal Data

43 distinct sort key values

Page 30: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

30

Solid State DrivesRandom Data

500 distinct sort key values

Page 31: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

31

Conclusion

• Flash MinSort is a sorting algorithm designed for datasets stored in flash memory on computationally constrained embedded devices.

• Its performance is better than existing algorithms by exploiting low-cost random reads.

• Depending on the properties of the dataset, Flash MinSort can outperform External Merge Sort on SSDs.

Page 32: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

32

References[1] Atmel. Atmel Flash AT45DB161D Data Sheet, 2010.

[2] N. Anciaux, L. Bouganim, and P. Pucheral. Memory Requirements for Query Execution in Highly Constrained Devices. In VLDB, pages 694–705, 2003.

[3] T. Cossentine and R. Lawrence. Fast Sorting on Flash Memory Sensor Nodes. In IDEAS 2010, pages 105–113, 2010.

[4] S. Fazackerley and R. Lawrence. Reducing Turfgrass Water Consumption Using Sensor Nodes and an Adaptive Irrigation controller. In Sensors Applications Symposium, Limerick, Ireland, 2010.

[5] H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall Press, Upper Saddle River, NJ, USA, 1 edition, 2002.

[6] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Ultra-Low Power Data Storage for Sensor Networks. In Proceedings of the 5th international conference on Information processing in sensor networks, IPSN ’06, pages 374–381, New York, NY, USA, 2006. ACM.

Page 33: An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense.

Tyler Cossentine - M.Sc. Thesis Defense

33

References[7] H. Park and K. Shim. FAST: Flash-Aware External Sorting for Mobile Database

Systems. Journal of Systems and Software, 82(8):1298 – 1312, 2009.

[8] G. J. Pottie and W. J. Kaiser. Wireless Integrated Network Sensors. Communications of the ACM, 43:51–58, May 2000.