Cache Conscious Indexes

16
Presented by: Sumit Lole M. Tech. (CS) 1 st Sem School Of Computer School Of Computer Science & IT Science & IT Guided by: Mrs. Shraddha Masih Cache Sensitive B+ Tree for Large Data Set

description

One of the way to improve the mining of huge dataset. It can be done by improving the performance of algorithm in searching process.

Transcript of Cache Conscious Indexes

Page 1: Cache Conscious Indexes

Presented by:Sumit LoleM. Tech. (CS) 1st Sem

School Of Computer School Of Computer Science & ITScience & IT

Guided by:

Mrs. Shraddha Masih

Cache Sensitive B+ Tree for Large Data Set

Page 2: Cache Conscious Indexes

OUTLINEOUTLINE

1. Introduction

2. Related Work

3. Cache Sensitive B+-Trees

4. Conclusion

Page 3: Cache Conscious Indexes

INTRODUCTIONINTRODUCTION

The principal criterion of any mining algorithm is to welcome influx of huge data that poses a real challenge to space-time requirement. Unless data are arranged in a compact and efficient way, algorithms, with limited primary storage, fail to produce output within reasonable time.

Page 4: Cache Conscious Indexes

Challenges in Data MiningChallenges in Data Mining

A fundamental challenge is to extend data mining to large data sets.

The size of the processed transactions will be larger than the size of main memory.

The most basic approach is to manipulate the data until it fits into memory.

Scaling data mining algorithms.

Page 5: Cache Conscious Indexes

RELATED WORKRELATED WORK

Computational and programming model that are used in high performance computing & reduce the cost of computing.

There are two basic distinctions between the various programming models used in high performance computing◦ Data Parallelism◦ Task Parallelism

Page 6: Cache Conscious Indexes

Basic Approaches for Scaling Basic Approaches for Scaling Data Intensive ComputingData Intensive Computing

Manipulate the data so that it fits into memory

Reduce the time to access out of memory data

Use several processors

Precomputing

Page 7: Cache Conscious Indexes

Reduce the Time to Access out Reduce the Time to Access out of Memory dataof Memory data

Efficiently access of disk with improves in performance of core algorithms that can be scaled to large data set.

One way is to use specialized data structures to access data on disk

Page 8: Cache Conscious Indexes

Comparison between B+-Trees Comparison between B+-Trees and CSS-Treesand CSS-Trees

B+ tree full pointermore cache access and

more cache missesefficient for updating

operation, e.g. insertion and deletion

CSS treeno pointer fewer cache access

and fewer cache misses

acceptable for static data updated in batches

Page 9: Cache Conscious Indexes

CACHE SENSITIVE B+ TREES CACHE SENSITIVE B+ TREES

• Cache Sensitive B+-Trees with One Child Pointer

• Full CSB+-Trees

Page 10: Cache Conscious Indexes

Cache Sensitive B+-Trees with Cache Sensitive B+-Trees with One PointerOne Pointer

Similar as B+-treeAll the child nodes of

any given node are put into a node group with one pointer

Nodes within a node group are stored continuously and can be accessed using an offset to the first node in the group

Page 11: Cache Conscious Indexes

Cache Sensitive B+-Trees with Cache Sensitive B+-Trees with One Pointer (cont’d)One Pointer (cont’d)

Cache misses are reduced because a cache line can hold more keys than B+-Trees and can satisfy one more level comparison.

CSB+-Tree can support incremental updates in a way similar to B+-Tree

Page 12: Cache Conscious Indexes

Full CSB+-TreeFull CSB+-Tree

Motivation: reduce the split cost

Method:◦ pre-allocate space for a full node group◦ shift part of the node group along by one node

when a node split

Result:◦ reduce the split cost, but increase the space

complexity

Page 13: Cache Conscious Indexes

Operations on CSB+-TreeOperations on CSB+-Tree——SearchSearch

• Determine the rightmost key K in the node that is smaller than the search key

• Get the address of the child node

• Go to first step until find the search key or there is no other node can be checked

Page 14: Cache Conscious Indexes

CONCLUSIONCONCLUSION

Use of specialized data structures to access data on disk can makes lot improvement in efficiency of Mining Algorithm.

CSB+-Trees are more cache conscious than B+-Tree because of partial pointer elimination

CSB+-Trees support efficient incremental updates

Cache conscious index structures such as Cache Sensitive Search Trees (CSS-Trees) perform lookups much faster

Page 15: Cache Conscious Indexes

REFERENCESREFERENCESHANKINS, R.A., & PATEL, J. M. Effect of Node

Size on the Performance of Cache-Conscious Indices. Extended Report,

Jun Rao and Kenneth A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of the 25th VLDB Conference, 1999.

Ramakrishnan, Ramu (1997). Database Management Systems. McGraw-Hill, New York.

Alan J. Smith. Cache memories. ACM Computing Surverys, 14(3):473-530, 1982.

Page 16: Cache Conscious Indexes