Exploiting Multithreaded Architectures to Improve Data Management Operations

84
Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture Group @ U of C (ACAG) Department of Electrical and Computer Engineering University of Calgary

description

Exploiting Multithreaded Architectures to Improve Data Management Operations. Layali Rashid The Advanced Computer Architecture Group @ U of C (ACAG) Department of Electrical and Computer Engineering University of Calgary. Outline. The SMT and the CMP Architectures Join (Hash Join) - PowerPoint PPT Presentation

Transcript of Exploiting Multithreaded Architectures to Improve Data Management Operations

Page 1: Exploiting Multithreaded Architectures to Improve Data Management Operations

Exploiting Multithreaded Architectures to Improve Data Management Operations

Layali RashidThe Advanced Computer Architecture Group @ U of C

(ACAG)Department of Electrical and Computer Engineering

University of Calgary

Page 2: Exploiting Multithreaded Architectures to Improve Data Management Operations

2

Outline The SMT and the CMP Architectures Join (Hash Join)

Motivation Algorithm Results

Sort (Radix and Quick Sorts) Motivation Algorithms Results

Index (CSB+-Tree) Motivation Algorithm Results

Conclusions

Page 3: Exploiting Multithreaded Architectures to Improve Data Management Operations

3

The SMT and the CMP Architectures

Simultaneous Multithreading (SMT): multiple threads run simultaneously on a single processor.

Chip Multiprocessor (CMP): more than one processor are integrated on a single chip.

Page 4: Exploiting Multithreaded Architectures to Improve Data Management Operations

4

Hash Join Motivation

0%

10%

20%

30%

40%

50%

60%

70%

20 60 100 140Tuple Size (Byte)

L2

Lo

ad

Mis

s R

ate

Hash join is one of the most important operations commonly used in current commercial DBMSs.

The L2 cache load miss rate is a critical factor in main-memory hash join performance.

Increase level of parallelism in hash join.

4.4%

4.5%

4.6%

4.7%

4.8%

4.9%

5.0%

5.1%

5.2%

5.3%

5.4%

20 60 100 140Tuple Size (Byte)

L1

Lo

ad

Mis

s R

ate

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

20 60 100 140Tuple (Size)

Tra

ce C

ach

e M

iss

Ra

te

Page 5: Exploiting Multithreaded Architectures to Improve Data Management Operations

5

Architecture-Aware Hash Join (AA_HJ)

Build Index Partition Phase Tuples divided equally between threads, each thread has its own

set of L2-cache size clusters The Build and Probe Index Partition Phase

One thread builds a hash table from each key-range, other threads index partition the probe relation similar to the previous phase.

Probe Phase See figure.

Page 6: Exploiting Multithreaded Architectures to Improve Data Management Operations

6

AA_HJ Results

05

101520253035404550

20 60 100 140

Tuple Size (Byte)

Tim

e (S

econ

d)

PT NPT Index PT 2 4 8 12 16

We achieve speedups ranging from 2 to 4.6 compared to PT on Quad Intel Xeon Dual Core server.

Speedups for the Pentium 4 with HT ranges between 2.1 to 2.9 compared to PT.

Page 7: Exploiting Multithreaded Architectures to Improve Data Management Operations

7

Memory-Analysis for Multithreaded AA_HJ

0%

10%

20%

30%

40%

50%

60%

70%

20 60 100 140Tuple Size (Byte)

L2 Lo

ad M

iss R

ate

NPT 2 4 8 12 16

A decrease in L2 load miss rate is due to the cache-sized index partitioning, constructive cache sharing and Group Prefetching. A minor increase in L1 data cache load miss rate from 1.5% to 4%.

3%

4%

5%

6%

7%

8%

9%

10%

20 60 100 140Tuple Size (Byte)

L1 L

oad

Miss

Rat

e

NPT 2 4 8 12 16

Page 8: Exploiting Multithreaded Architectures to Improve Data Management Operations

8

The Sort Motivation Some researches find that the sort algorithms suffer

from high level two cache miss rates. Whereas others pointed out that radix sort has high

TLB miss rates. In addition, the fact that most sort algorithms are

sequential has high impact on generating efficient parallel sort algorithms.

In our work we target Radix Sort (distribution-based sort) and Quick Sort (comparison-based sort).

Page 9: Exploiting Multithreaded Architectures to Improve Data Management Operations

9

Our Parallel Sorts Radix Sort

A hybrid radix sort between Partition Parallel Radix Sort and Cache-Conscious Radix Sort.

Repartitioning large destination buckets only when they are significantly larger than the L2 cache size.

Quick Sort Use Fast Parallel Quick Sort. Dynamically balancing the load across threads. Improve thread parallelism during the sequential cleaning up

sorting. Stop the recursive partitioning process when the size of the

subarray is almost equal to the largest cache size.

Page 10: Exploiting Multithreaded Architectures to Improve Data Management Operations

10

The Sort Timing for the Random Datasets on the SMT Arhcitecure

Radix Sort and Quick Sort shows low L1 and L2 caches miss rates on our machines. Radix Sort has a DTLB Store miss rate up to 26%.

Radix Sort accomplishes slight speedup on SMT architectures that doesn’t exceed 3% , due to its CPU-intensive nature.

Enhancements in execution time for quick sort are about 25% to 30%.

0

2

4

6

8

10

12

14

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Time (

Seco

nd)

1 2

Quick Sort

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Time (

Seco

nd)

LSB 1 2

Radix Sort

Page 11: Exploiting Multithreaded Architectures to Improve Data Management Operations

11

The Sort Timing for the Random Datasets on the CMP Architecture

0

2

4

6

8

10

12

14

16

18

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Time

(Sec

ond)

1 2 4 8 12 16

0

1

2

3

4

5

6

7

8

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

LSB 1 2 4 8 12 16

Radix Sort Quick Sort

Our speedups for the Radix sort range from 54% for two threads up to 300% for threads from 2 to 8. Our speedups for the Quick Sort range from 34% to 417%.

Page 12: Exploiting Multithreaded Architectures to Improve Data Management Operations

12

The Index Motivation

Despite the fact that CSB+-tree proves to have significant speedup over B+-trees, experiments show that a large fraction of its execution time is still spent waiting for data.

The L2 load miss rate for single-threaded CSB+-tree is as high as 42%.

Page 13: Exploiting Multithreaded Architectures to Improve Data Management Operations

13

Dual-threaded CSB+-Tree

One CSB+-Tree. Single thread for the

bulkloading. Two threads for

probing. Unlike inserts and

deletes, search needs no synchronization since it involves reads only.

Page 14: Exploiting Multithreaded Architectures to Improve Data Management Operations

14

Index Results

0

0.02

0.04

0.06

0.08

0.1

0.12

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07Number of Keys

Tim

e (S

econ

d)

Single-Threaded Dual-Threaded

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Number of Keys

L2 L

oad

Mis

s Ra

te

Single-Threaded Dual-Threaded

Speedups for dual-threaded CSB+-tree range from 19% to 68% compared to single-threaded CSB+-tree.

Two threads for memory-bound operations propose more chances to keep the functional units working.

Sharing one CSB+-tree amongst both of our threads result in constructive behaviour and reduction of 6% -8% in the L2 miss rate.

Page 15: Exploiting Multithreaded Architectures to Improve Data Management Operations

15

Conclusions State-of-the-art parallel architectures (SMT and

CMP) have opened opportunities for the improvement of software operations to better utilize the underlying hardware resources.

It is essential to have efficient implementations of database operations.

We propose architecture-aware multithreaded database algorithms of the most important database operations (joins, sorts and indexes).

We characterize the timing and memory behaviour of these database operations.

Page 16: Exploiting Multithreaded Architectures to Improve Data Management Operations

16

The End

Page 17: Exploiting Multithreaded Architectures to Improve Data Management Operations

17

Backup Slides

Page 18: Exploiting Multithreaded Architectures to Improve Data Management Operations

18

Figure 1‑1: The SMT Architecture

Page 19: Exploiting Multithreaded Architectures to Improve Data Management Operations

19

Figure 1‑2: Comparison between the SMT and the Dual Core Architectures

Page 20: Exploiting Multithreaded Architectures to Improve Data Management Operations

20

Figure 1‑3: Combining the SMT and the CMP Architectures

Page 21: Exploiting Multithreaded Architectures to Improve Data Management Operations

21

Figure 2‑1: The L1 Data Cache Load Miss Rate for Hash Join

4.4%

4.5%

4.6%

4.7%

4.8%

4.9%

5.0%

5.1%

5.2%

5.3%

5.4%

20 60 100 140Tuple Size (Byte)

L1 L

oad

Mis

s R

ate

Page 22: Exploiting Multithreaded Architectures to Improve Data Management Operations

22

Figure 2‑2: The L2 Cache Load Miss Rate for Hash Join

0%

10%

20%

30%

40%

50%

60%

70%

20 60 100 140Tuple Size (Byte)

L2 L

oad

Mis

s R

ate

Page 23: Exploiting Multithreaded Architectures to Improve Data Management Operations

23

Figure 2‑3: The Trace Cache Miss Rate for Hash Join

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

20 60 100 140Tuple (Size)

Trac

e C

ache

Mis

s R

ate

Page 24: Exploiting Multithreaded Architectures to Improve Data Management Operations

24

Figure 2‑4: Typical Relational Table in RDBMS

Page 25: Exploiting Multithreaded Architectures to Improve Data Management Operations

25

Figure 2‑5: Database Join

Page 26: Exploiting Multithreaded Architectures to Improve Data Management Operations

26

Figure 2‑6: Hash Equi-join Process

Page 27: Exploiting Multithreaded Architectures to Improve Data Management Operations

27

Figure 2‑7: Hash Table Structure

Page 28: Exploiting Multithreaded Architectures to Improve Data Management Operations

28

Figure 2‑8: Hash Join Base Algorithm

partition R into R0, R1,…, Rn-1partition S into S0, S1,…, Sn-1for i = 0 until i = n-1

use Ri to build hash-tablei

for i = 0 until i = n-1probe Si using hash-

tablei

Page 29: Exploiting Multithreaded Architectures to Improve Data Management Operations

29

Figure 2‑9: AA_HJ Build Phase Executed by one Thread

Page 30: Exploiting Multithreaded Architectures to Improve Data Management Operations

30

Figure 2‑10: AA_HJ Probe Index Partitioning Phase Executed by one Thread

Page 31: Exploiting Multithreaded Architectures to Improve Data Management Operations

31

Figure 2‑11: AA_HJ S-Relation Partitioning and Probing Phases

Page 32: Exploiting Multithreaded Architectures to Improve Data Management Operations

32

Figure 2‑12: AA_HJ Multithreaded Probing Algorithm

Page 33: Exploiting Multithreaded Architectures to Improve Data Management Operations

33

Table 2‑1: Machines Specifications

Page 34: Exploiting Multithreaded Architectures to Improve Data Management Operations

34

Table 2‑2: Number of Tuples for Machine 1

Page 35: Exploiting Multithreaded Architectures to Improve Data Management Operations

35

Table 2‑3: Number of Tuples for Machine 2

Page 36: Exploiting Multithreaded Architectures to Improve Data Management Operations

36

Figure 2‑13: Timing for three Hash Join Partitioning Techniques

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

20 60 100 140

Tuple Size (Byte)

Tim

e (

Se

con

d)

PT NPT Index PT

Page 37: Exploiting Multithreaded Architectures to Improve Data Management Operations

37

Figure 2‑14: Memory Usage for three Hash Join Partitioning Techniques

0

50

100

150

200

250

300

350

400

450

500

20 60 100 140

Tuple Size (Byte)

Mem

ory

(MB

yte)

PT NPT Index PT

Page 38: Exploiting Multithreaded Architectures to Improve Data Management Operations

38

Figure 2‑15: Timing for Dual-threaded Hash Join

0

50

100

150

200

250

300

350

400

450

500

20 60 100 140

Tuple Size (Byte)

Tim

e (

Se

con

d)

SMT+PT SMT+NPT SMT+Index PT

Page 39: Exploiting Multithreaded Architectures to Improve Data Management Operations

39

Figure 2‑16: Memory Usage for Dual-threaded Hash Join

0

50

100

150

200

250

300

350

400

450

500

20 60 100 140

Tuple Size (Byte)

Mem

ory

(MB

yte)

SMT+PT SMT+NPT SMT+Index PT

Page 40: Exploiting Multithreaded Architectures to Improve Data Management Operations

40

Figure 2‑17: Timing Comparison of all Hash Join Algorithms

0.00.20.40.60.81.01.21.41.61.82.02.22.42.62.83.03.23.43.63.84.0

20 60 100 140

Tuple Size (Byte)

Tim

e (S

econ

d)

AA_HJ+GP+SMT AA_HJ+SMT SMT+NPT NPT

SMT+PT PT SMT+Index PT Index PT

Page 41: Exploiting Multithreaded Architectures to Improve Data Management Operations

41

Figure 2‑18: Memory Usage Comparison of all Hash Join Algorithms

0

50

100

150

200

250

300

350

400

450

500

20 60 100 140

Tuple Size (Byte)

Mem

ory

(MBy

te)

AA_HJ+GP+SMT AA_HJ+SMT SMT+NPT NPT

SMT+PT PT SMT+Index PT Index PT

Page 42: Exploiting Multithreaded Architectures to Improve Data Management Operations

42

Figure 2‑19: Speedups due to the AA_HJ+SMT and the AA_HJ+GP+SMT Algorithms

0

0.5

1

1.5

2

2.5

3

3.5

20 60 100 140

Tuple Size (Byte)

PT SMT+PT Index PT SMT+Index PT AA_HJ+SMT AA_HJ+GP+SMT

Page 43: Exploiting Multithreaded Architectures to Improve Data Management Operations

43

Figure 2‑20: Varying Number of Clusters for the AA_HJ+GP+SMT

0

0.5

1

1.5

2

2.5

32 64 128 512 1024 2048Number of Clusters

Tim

e (

Se

co

nd

)

20 60 100 140Tuple Size (Byte)

Page 44: Exploiting Multithreaded Architectures to Improve Data Management Operations

44

Figure 2‑21: Varying the Selectivity for Tuple Size = 100Bytes

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

20 40 60 80 100

Selectivity

Tim

e (

Second)

PT SMT+PT AA_HJ+SMT AA_HJ+GP+SMT

Page 45: Exploiting Multithreaded Architectures to Improve Data Management Operations

45

Figure 2‑22: Time Breakdown Comparison for the Hash Join Algorithms for tuple sizes 20Bytes and 100Bytes

0

0.5

1

1.5

2

2.5

3

3.5

4

100 20 100 20 100 20 100 20 100 20 100 20 100 20 100 20

NPT SMT+NPT PT SMT+PT Index PT SMT+Index PT AA_HJ+SMT AA_HJ+GP+SMT

Tim

e (

Se

con

d)

Build Index Partition Probe Index Partition Partition Build Probe

Page 46: Exploiting Multithreaded Architectures to Improve Data Management Operations

46

Figure 2‑23: Timing for the Multi-threaded Architecture-Aware Hash Join

05

101520253035404550

20 60 100 140

Tuple Size (Byte)

Tim

e (S

econ

d)

PT NPT Index PT 2 4 8 12 16

Page 47: Exploiting Multithreaded Architectures to Improve Data Management Operations

47

Figure 2‑24: Speedups for the Multi-Threaded Architecture-Aware Hash Join

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

20 60 100 140

PT Index PT 2 4 8 12 16

Page 48: Exploiting Multithreaded Architectures to Improve Data Management Operations

48

Figure 2‑25: Memory Usage for the Multi-Threaded Architecture-Aware Hash Join

Page 49: Exploiting Multithreaded Architectures to Improve Data Management Operations

49

Figure 2‑26: Time Breakdown Comparison for Hash Join Algorithms

00.5

11.5

22.5

33.5

44.5

55.5

66.5

77.5

88.5

99.510

10.511

11.512

12.513

13.514

14.515

PT

Index P

T

2 4 8 12

16

PT

Index P

T

2 4 8 12

16

PT

Index P

T

2 4 8 12

16

PT

Index P

T

2 4 8 12

16

20 60 100 140

Tuple Size

Tim

e (

Second)

Partition Build Index Partition Probe Index Partition Build Probe

35.91 second

27.70 second

Page 50: Exploiting Multithreaded Architectures to Improve Data Management Operations

50

Figure 2‑27: The L1 Data Cache Load Miss Rate for NPT and AA_HJ

3%

4%

5%

6%

7%

8%

9%

10%

20 60 100 140Tuple Size (Byte)

L1 L

oad

Mis

s R

ate

NPT 2 4 8 12 16

Page 51: Exploiting Multithreaded Architectures to Improve Data Management Operations

51

Figure 2‑28: Number of Loads for NPT and AA_HJ

0.E+00

1.E+09

2.E+09

3.E+09

4.E+09

5.E+09

6.E+09

20 60 100 140Tuple Size (Byte)

Num

ber o

f Loa

ds

NPT 2 4 8 12 16

Page 52: Exploiting Multithreaded Architectures to Improve Data Management Operations

52

Figure 2‑29: The L2 Cache Load Miss Rate for NPT and AA_HJ

0%

10%

20%

30%

40%

50%

60%

70%

20 60 100 140Tuple Size (Byte)

L2 L

oad

Mis

s Ra

te

NPT 2 4 8 12 16

Page 53: Exploiting Multithreaded Architectures to Improve Data Management Operations

53

Figure 2‑30: The Trace Cache Miss Rate for NPT and AA_HJ

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

0.16%

20 60 100 140Tuple Size (Byte)

Trac

e C

ache

Mis

s R

ate

NPT 2 4 8 12 16

Page 54: Exploiting Multithreaded Architectures to Improve Data Management Operations

54

Figure 2‑31: The DTLB Load Miss Rate for NPT and AA_HJ

0%

1%

2%

3%

4%

5%

6%

7%

8%

20 60 100 140Tuple Size (Byte)

DT

LB

Load M

iss R

ate

NPT 2 4 8 12 16

Page 55: Exploiting Multithreaded Architectures to Improve Data Management Operations

55

Figure 3‑1: The LSD Radix Sort

1 for (i= 0; i < number_of_digits; i ++)2 sort source-array based on digiti;

Page 56: Exploiting Multithreaded Architectures to Improve Data Management Operations

56

Figure 3‑2: The Counting LSD Radix Sort Algorithm

Page 57: Exploiting Multithreaded Architectures to Improve Data Management Operations

57

Figure 3‑3: Parallel Radix Sort Algorithm

Page 58: Exploiting Multithreaded Architectures to Improve Data Management Operations

58

Table 3‑1: Memory Characterization for LSD Radix Sort with Different Datasets

Page 59: Exploiting Multithreaded Architectures to Improve Data Management Operations

59

Figure 3‑4: Radix Sort Timing for the Random Datasets on Machine 2

0

1

2

3

4

5

6

7

8

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (

Second)

LSB 1 2 4 8 12 16

Page 60: Exploiting Multithreaded Architectures to Improve Data Management Operations

60

Figure 3‑5: Radix Sort Timing for the Gaussian Datasets on Machine 2

0

1

2

3

4

5

6

7

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (

Se

con

d)

LSB 1 2 4 8 12 16

Page 61: Exploiting Multithreaded Architectures to Improve Data Management Operations

61

Figure 3‑6: Radix Sort Timing for Zero Datasets on Machine 2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

LSB 1 2 4 8 12 16

Page 62: Exploiting Multithreaded Architectures to Improve Data Management Operations

62

Figure 3‑7: Radix Sort Timing for the Random Datasets on Machine 1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

LSB 1 2

Page 63: Exploiting Multithreaded Architectures to Improve Data Management Operations

63

Figure 3‑8: Radix Sort Timing for the Gaussian Datasets on Machine 1

00.5

11.5

22.5

33.5

44.5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

LSB 1 2

Page 64: Exploiting Multithreaded Architectures to Improve Data Management Operations

64

Figure 3‑9: Radix Sort Timing for the Zero Datasets on Machine 1

0

0.2

0.4

0.6

0.8

1

1.2

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

LSB 1 2

Page 65: Exploiting Multithreaded Architectures to Improve Data Management Operations

65

Figure 3‑10: The DTLB Stores Miss Rate for the Radix Sort on Machine 2 (Random Datasets)

0%

5%

10%

15%

20%

25%

30%

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

DTLB

Sto

re M

iss

Rate

LSB 1 2 4 8 16

Page 66: Exploiting Multithreaded Architectures to Improve Data Management Operations

66

Figure 3‑11: The L1 Data Cache Load Miss Rate for the Radix Sort on Machine 2 (Random Datasets)

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

L1 D

ata

Load

Mis

s Ra

te

LSB 1 2 4 8 12 16

Page 67: Exploiting Multithreaded Architectures to Improve Data Management Operations

67

Table 3‑2: Memory Characterization for Memory-Tuned Quick Sort with Different Datasets

Page 68: Exploiting Multithreaded Architectures to Improve Data Management Operations

68

Figure 3‑12: Quicksort Timing for the Random Datasets on Machine 2

0

2

4

6

8

10

12

14

16

18

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2 4 8 12 16

Page 69: Exploiting Multithreaded Architectures to Improve Data Management Operations

69

Figure 3‑13: Quicksort Timing for the Random Dataset on Machine 1

0

2

4

6

8

10

12

14

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2

Page 70: Exploiting Multithreaded Architectures to Improve Data Management Operations

70

Figure 3‑14: Quicksort Timing for the Gaussian Datasets on Machine 2

0

2

4

6

8

10

12

14

16

18

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2 4 8 12 16

Page 71: Exploiting Multithreaded Architectures to Improve Data Management Operations

71

Figure 3‑15: Quicksort Timing for the Gaussian Dataset on Machine 1

0

2

4

6

8

10

12

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2

Page 72: Exploiting Multithreaded Architectures to Improve Data Management Operations

72

Figure 3‑16: Quicksort Timing for the Zero Datasets on Machine 2

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2 4 8 12 16

Page 73: Exploiting Multithreaded Architectures to Improve Data Management Operations

73

Figure 3‑17: Quicksort Timing for the Zero Dataset on Machine 1

0

0.5

1

1.5

2

2.5

3

3.5

1.E+07 2.E+07 3.E+07 4.E+07 5.E+07 6.E+07

Number of Keys

Tim

e (S

econ

d)

1 2

Page 74: Exploiting Multithreaded Architectures to Improve Data Management Operations

74

Table 3‑3: The Sort Results for Machine 1

Page 75: Exploiting Multithreaded Architectures to Improve Data Management Operations

75

Table 3‑4: The Sort Results for Machine 2

Page 76: Exploiting Multithreaded Architectures to Improve Data Management Operations

76

Figure 4‑1: Search Operation on an Index Tree

Page 77: Exploiting Multithreaded Architectures to Improve Data Management Operations

77

Figure 4‑2: Differences between the B+-Tree and the CSB+-Tree

Page 78: Exploiting Multithreaded Architectures to Improve Data Management Operations

78

Figure 4‑3: Dual-Threaded CSB+-Tree for the SMT Architectures

Page 79: Exploiting Multithreaded Architectures to Improve Data Management Operations

79

Figure 4‑4: Timing for the Single and Dual-Threaded CSB+-Tree

0

0.02

0.04

0.06

0.08

0.1

0.12

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07Number of Keys

Tim

e (

Second)

Single-Threaded Dual-Threaded

Page 80: Exploiting Multithreaded Architectures to Improve Data Management Operations

80

Figure 4‑5: The L1 Data Cache Load Miss Rate for the Single and Dual-Threaded CSB+-Tree

0%

5%

10%

15%

20%

25%

30%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07Number of Keys

L1 D

ata

Load

Mis

s Ra

te

Single-Threaded Dual-Threaded

Page 81: Exploiting Multithreaded Architectures to Improve Data Management Operations

81

Figure 4‑6: The Trace Cache Miss Rate for the Single and Dual-Threaded CSB+-Tree

0.00%

0.02%

0.04%

0.06%

0.08%

0.10%

0.12%

0.14%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Number of Keys

Tra

ce C

ache

Mis

s R

ate

Single-Threaded Dual-Threaded

Page 82: Exploiting Multithreaded Architectures to Improve Data Management Operations

82

Figure 4‑7: The L2 Load Miss Rate for the Single and Dual-Threaded CSB+-Tree

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Number of Keys

L2 L

oad

Mis

s Ra

te

Single-Threaded Dual-Threaded

Page 83: Exploiting Multithreaded Architectures to Improve Data Management Operations

83

Figure 4‑8: The DTLB Load Miss Rate for the Single and Dual-Threaded CSB+-Tree

0%2%4%

6%8%

10%12%14%

16%18%20%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

Number of Keys

DTL

B Lo

ad M

iss

Rat

e

Single-Threaded Dual-Threaded

Page 84: Exploiting Multithreaded Architectures to Improve Data Management Operations

84

Figure 4‑9: The ITLB Load Miss Rate for the Single and Dual-Threaded CSB+-Tree

0%

2%4%

6%

8%

10%12%

14%

16%18%

20%

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07Number of Keys

ITLB

Mis

s Ra

te

Single-Threaded Dual-Threaded