Cache Hierarchy Inspired Compression: A Novel Architecture for Data Streams

Department of Computer Science, University of Waikato, New Zealand

Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby

Traditional machine learning model

Data stream requirements

Scaling up existing methods

Network caching CHIC architecture Experimental Design Results Discussion Further Work

Cache Hierarchy Inspired

Compression: A Novel Architecture for Data Streams

Traditional Machine Learning Create train/test splits of the data (possibly via

cross-validation) Load ALL the training data into main memory Compute a model from the training data (may

involve multiple passes) Load all the test data into main memory Compute accuracy measures from the test data

Consequences Data is processed one instance at a time Very few incremental methods – none used

seriously in practice. Many existing techniques don’t scale Machine Learning is perceived as a small to

medium data set field of study. Larger data sets are tackled through sampling or

building several models on separate portions and combining their predictions

Data Streams Takes the “stream” view, source maybe finite or

infinite Concept of train/test less well defined, could train

for a while then test for a while – what is the definition of “a while”?

What ever you do you can be sure that ALL the data will NOT fit in main memory

Data Stream Constraints Cannot store instances (not all anyway) Cannot use more than available memory – no

swapping to disk Cannot dwell too long over an instance - must

keep up with the incoming rate of instances Cannot wait to make predictions – need to be

ready to make predictions at any time

Scaling up existing methods Could learn models using existing methods in

batches and then merge models Could merge instances (meta-instances) Could use a cache model where we keep a set of

models and update the cache in time – eg use least recently used, least frequently used type strategies

Could do the above but use performance measures to decide the make-up of the cache.

Caching in Data Communications

Web proxy caches provide a good model for what we need to satisfy stream constraints

Real caches are hierarchical (Squid) The hierarchy provides a mechanism for sharing

the load and increasing the likelihood of a page hit When full a cache needs a replacement policy To replicate this system we need to design a

hierarchy, fill it (with models) and implement a model replacement policy

General CHIC Architecture Idea: Build a hierarchy of levels (N) as follows:

Level Zero: data buffer from stream Level One: Build models from data at level zero Level Two to N-1: Fill with “best” models from lower

levels Level N: Adopt models from level N-1 but also

discard models so that new ones can be entered For prediction use all models in hierarchy and vote

Features Can use any machine learning algorithm to build

models Implements a form of batch-incremental learning Replacement policy can be performance based As with the web cache CHIC fills up initially and

then keeps the best performing models If a variety of models is used at the lower levels

then it is possible to adapt to the source of the data.

Experimental Design Try to demonstrate adaptation to data source

Learn a mixture of models at levels one and two and let the performance based promotion mechanism take over

Evaluate: two issues Need performance measure (model

promotion/deletion) Overall performance of hierarchy (adopt a first test

then train approach)

Data Sources Random Naïve Bayes

Weight attribute labels and class values (here we use 5 classes, 5 nominal and 5 numeric attributes)

Random Tree Choose a depth and randomly assign splitter nodes, here 5

nodes deep, leaves starting from depth 3, as above number of attributes/classes.

Random RBF Random set of centers for each class, center is weighted and

has a standard deviation – here 50 centers, 10 numeric attributes and 5 classes

Real data Forest covertype (UCI repository), 7 classes, 500K instances.

Specific CHIC Architecture Six levels with 16 models per level Data buffer of size 1000 instances First level uses four algorithms to generate

models Naïve Bayes(N), C4.5(J), linear SVM(L), RBF

Kernel SVM(R)

Example Read 1000 instances into buffer, build 4 models, repeat on

next 3 buffers – leads to level one full Read next 1000 and build 4 more models Using the buffer data as test data evaluate all models at

level 1 – promote best 4 (groupwise) to level 2 and replace the worst 4 with the new ones.

Continue in this manner (at 12,000 instances level 2 will be full)

Note: Levels 1&2 always have 4 models from the 4 groups From level 2 ONLY promote the best (adapt to source)

Example Contd Four models are promoted to level 3 the 4 worst

deleted freeing 8 spaces. Levels 3, 4 & 5 work on the same basis. Level 6

simply deletes the worst 4 models to free up space.

At prediction time all models have an equal vote and the classification decision is arrived at by majority.

Resultsafter 1000 instances: after 2000 instances:N---J---L---R--- NN--JJ--LL--RR-- after 3000 instances: after 4000 instances:NNN-JJJ-LLL-RRR- NNNNJJJJLLLLRRRR after 5000 instances: after 6000 instances:N-NNJJ-JL-LLR-RR NNNNJJJJLLLLRRRRNJLR------------ NJLR------------ after 7000 instances: after 8000 instances:NNN-J-JJL-LLR-RR NNNNJJJJLLLLRRRRNJLRNJLR-------- NJLRNJLR--------

Continuedafter 9000 instances: after 10000 instances:NN-N-JJJLL-L-RRR NNNNJJJJLLLLRRRRNJLRNJLRNJLR---- NJLRNJLRNJLR---- after 11000 instances: after 12000 instances:-NNN-JJJLLL-RR-R NNNNJJJJLLLLRRRRNJLRNJLRNJLRNJLR NJLRNJLRNJLRNJLR after 13000 instances:NN-NJJ-JLL-LRR-RNNLJNLLRN-L-N-L-JJJJ------------

Model Adaptation to Sourcerandom tree random naive Bayes

N-NN-JJJLL-LR-RR -NNNJJ-JLLL-R-RR

NNNNLNNLJLRNNJLR NJJJLLLLRNJJJLLR

JJJJJNNLJJ---JL- NNNNNJJNJJ-NN---

JJJJJJJJJJJJJJJJ NNNNNNNNNNNNNNNN

JJJJ--JJ-JJJJJ-J NNNNNNNN---NNNN-

JJJJJJJJJJJJJJJJ NNNNNNNNNNNNNNNN

Continuedrandom RBF covertype

NNN--JJJLL-LRR-R N-NN-JJJ-LLLR-RR

NNJJLRNJNNJJNJLR NNJJJRJLNRJ---R

RRJRJJRRRRJRRJJJ LJJLLJLJLJL--N--

RRRRRRRRRRRRRRRR LLLLLLLLLLLLJLLJ

RRRRRRRRRRRRRRRR LLLLLLLLJL-J-L--

RRRRRRRRRRRRRRRR LLLLLLLLLLLLLLLL

Learning Curve – Random Tree

Learning Curve - CoverType

Conclusion Novel architecture for data streams Operates much like a web cache (hierarchy and

replacement policy) Provides scaling-up mechanism for arbitrary

classifiers (batch-incremental) Can be adapted to clustering, regression,

association rule learning Thousands of options still to explore!

Cache Hierarchy Inspired Compression: A Novel Architecture for Data Streams

Documents

Transcript of Cache Hierarchy Inspired Compression: A Novel Architecture for Data Streams

Amoeba-Cache Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Ch. 12 Cache Direct Mapped Cache. Comp Sci 251 -- mem hierarchy 2 Memory Hierarchy Registers: very few, very fast cache memory: small, fast main memory:

Fast and/or Large Memory – Cache and Memory Hierarchy · Fast and/or Large Memory – Cache and Memory Hierarchy Pavel Píša, Richard Šusta, Michal Štepanovský, Miroslav Šnorek

Memory Hierarchy - Colorado State Universitycs270/.Spring16/slides/... · 2016-04-17 · Fast: Exploiting Memory Hierarchy — 15 Cache Misses On cache hit, CPU proceeds normally

Chapter 7 – Large and Fast: Exploiting Memory Hierarchy · L1 instruction and data caches and L2 cache cache cache L2 cache to mm L1 L1 Memory hierarchy operation (1) Search L1

Lecture 6: Memory Hierarchy and Cache (Continued)

Cache Memory. Outline zMemory Hierarchy zDirect-Mapped Cache zWrite-Through, Write-Back zCache replacement policy zExamples.

Dynamic Partitioning of the Cache Hierarchy in Shared Data Centersamza/papers/vldb08.pdf · 2008-09-09 · Dynamic Partitioning of the Cache Hierarchy in Shared Data Centers Gokul

EECC551 - Shaaban #1 lec # 8 Spring 2006 4-19-2006 The Memory Hierarchy & Cache Memory Hierarchy & Cache Basics (from 550):Review of Memory Hierarchy &

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; 1995 2-level cache,

Lecture 12: Memory Hierarchy --Cache Optimizations...Lecture 12: Memory Hierarchy--Cache Optimizations CSCE 513 Computer Architecture Department of Computer Science and Engineering

Lecture 17: Memory Hierarchy and Cache Coherence · 2018. 2. 12. · Lecture 17: Memory Hierarchy and Cache Coherence ... Web servers) Local disks hold ﬁles retrieved from disks

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

Lecture 9: Floating Point Arithmetic Memory Hierarchy and Cache ...

EECC551 - Shaaban #1 lec # 8 Winter 2006 1-24-2007 The Memory Hierarchy & Cache Memory Hierarchy & Cache Basics (from 550):Review of Memory Hierarchy &

The Memory Hierarchy CS 740 Sept. 28, 2001 Topics The memory hierarchy Cache design.

Memory Hierarchy and Cache Management

Criticality Aware Tiered Cache Hierarchy: A …...Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-level Cache Hierarchies Anant Vithal Nori, Jayesh Gaur, Siddharth

Memory Hierarchy and Cache Instruction Set Architecture

Secure Hierarchy-Aware Cache Replacement Policy (SHARP)iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_isca17_2.pdf · 2017. 6. 28. · Secure Hierarchy-Aware Cache Replacement Policy