Cache Conscious Indexes
-
Upload
tata-consultancy-services -
Category
Technology
-
view
1.544 -
download
0
description
Transcript of Cache Conscious Indexes
Presented by:Sumit LoleM. Tech. (CS) 1st Sem
School Of Computer School Of Computer Science & ITScience & IT
Guided by:
Mrs. Shraddha Masih
Cache Sensitive B+ Tree for Large Data Set
OUTLINEOUTLINE
1. Introduction
2. Related Work
3. Cache Sensitive B+-Trees
4. Conclusion
INTRODUCTIONINTRODUCTION
The principal criterion of any mining algorithm is to welcome influx of huge data that poses a real challenge to space-time requirement. Unless data are arranged in a compact and efficient way, algorithms, with limited primary storage, fail to produce output within reasonable time.
Challenges in Data MiningChallenges in Data Mining
A fundamental challenge is to extend data mining to large data sets.
The size of the processed transactions will be larger than the size of main memory.
The most basic approach is to manipulate the data until it fits into memory.
Scaling data mining algorithms.
RELATED WORKRELATED WORK
Computational and programming model that are used in high performance computing & reduce the cost of computing.
There are two basic distinctions between the various programming models used in high performance computing◦ Data Parallelism◦ Task Parallelism
Basic Approaches for Scaling Basic Approaches for Scaling Data Intensive ComputingData Intensive Computing
Manipulate the data so that it fits into memory
Reduce the time to access out of memory data
Use several processors
Precomputing
Reduce the Time to Access out Reduce the Time to Access out of Memory dataof Memory data
Efficiently access of disk with improves in performance of core algorithms that can be scaled to large data set.
One way is to use specialized data structures to access data on disk
Comparison between B+-Trees Comparison between B+-Trees and CSS-Treesand CSS-Trees
B+ tree full pointermore cache access and
more cache missesefficient for updating
operation, e.g. insertion and deletion
CSS treeno pointer fewer cache access
and fewer cache misses
acceptable for static data updated in batches
CACHE SENSITIVE B+ TREES CACHE SENSITIVE B+ TREES
• Cache Sensitive B+-Trees with One Child Pointer
• Full CSB+-Trees
Cache Sensitive B+-Trees with Cache Sensitive B+-Trees with One PointerOne Pointer
Similar as B+-treeAll the child nodes of
any given node are put into a node group with one pointer
Nodes within a node group are stored continuously and can be accessed using an offset to the first node in the group
Cache Sensitive B+-Trees with Cache Sensitive B+-Trees with One Pointer (cont’d)One Pointer (cont’d)
Cache misses are reduced because a cache line can hold more keys than B+-Trees and can satisfy one more level comparison.
CSB+-Tree can support incremental updates in a way similar to B+-Tree
Full CSB+-TreeFull CSB+-Tree
Motivation: reduce the split cost
Method:◦ pre-allocate space for a full node group◦ shift part of the node group along by one node
when a node split
Result:◦ reduce the split cost, but increase the space
complexity
Operations on CSB+-TreeOperations on CSB+-Tree——SearchSearch
• Determine the rightmost key K in the node that is smaller than the search key
• Get the address of the child node
• Go to first step until find the search key or there is no other node can be checked
CONCLUSIONCONCLUSION
Use of specialized data structures to access data on disk can makes lot improvement in efficiency of Mining Algorithm.
CSB+-Trees are more cache conscious than B+-Tree because of partial pointer elimination
CSB+-Trees support efficient incremental updates
Cache conscious index structures such as Cache Sensitive Search Trees (CSS-Trees) perform lookups much faster
REFERENCESREFERENCESHANKINS, R.A., & PATEL, J. M. Effect of Node
Size on the Performance of Cache-Conscious Indices. Extended Report,
Jun Rao and Kenneth A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of the 25th VLDB Conference, 1999.
Ramakrishnan, Ramu (1997). Database Management Systems. McGraw-Hill, New York.
Alan J. Smith. Cache memories. ACM Computing Surverys, 14(3):473-530, 1982.