File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger,...
-
Upload
dwain-grant -
Category
Documents
-
view
212 -
download
0
Transcript of File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger,...
![Page 1: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/1.jpg)
File Classification in self-* storage systems
Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard,
Margo Seltzer
![Page 2: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/2.jpg)
Introduction
Self-* infrastructure need information about Users Applications Policies
Not readily provided, and cannot depend on them to provide them
So? Must be learned
![Page 3: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/3.jpg)
Self-* storage systems Sub-problem of the self-* structure Key: to get hints based on what
creators associate with their files File size File names Lifetimes
Intentions determined, then decisions can be made
Results: better file organization, performance
![Page 4: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/4.jpg)
Classifying Files
Current: rule-of-thumb policy selection Generic, not optimized
Better: distinguish classes Finer grained policies Ideally assigned at file creation
Determine classes at creation Self-* must learn this association
1) traces 2)running fs
![Page 5: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/5.jpg)
So, how? Create model that classify based on (some
attribs) Name Owner Permissions
Must filter out irrelevant attribs Classifier must learn rules to do so
Based on test set Then inference happens
![Page 6: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/6.jpg)
The right model
Model must be Scalable Dynamic Cost-sensitive (mis-prediction cost) Interpretable (human)
Model selected: decision trees
![Page 7: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/7.jpg)
ABLE
Attribute-based learning environment 1. obtain traces 2. make decision tree 3. make predictions
Top down, until all attribs are used Split sample until leaves have similar file attri
bs After creation, query begins
![Page 8: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/8.jpg)
Tests
Based on several systems to make sure it is workload-independent DEAS03 EECS03 CAMPUS LAB
The control: MODE algorithm – places all files in a single cluster
![Page 9: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/9.jpg)
Results
Prediction results quite good 90% - 100% claimed Clustering files by attribs are clear Predict that a model’s ruleset will conv
erge over time
![Page 10: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/10.jpg)
Benefits of incremental learning
Dynamically refines model as samples become available
Generally better than one-shot learners Sometimes one-shot performs poorly
Ruleset of incremental learners are smaller
![Page 11: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/11.jpg)
On accuracy More attributes = chance of over-fitting
More rules -> smaller ratios Loses compression benefits
Predictive models can have false predictions Can impact performance
Things that should be in RAM is placed on disk instead etc.
Solution: cost functions Penalize errors Create biased tree System goals will need to be translated into it
![Page 12: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/12.jpg)
Conclusion
These trees provide prediction accuracies in the 90% range
Adaptable via incremental learning Continued work: integration into
self-* infrastructure
![Page 13: File Classification in self-* storage systems Michael Mesnier, Eno Thereska, Gregory R. Ganger, Daniel Ellard, Margo Seltzer.](https://reader035.fdocuments.in/reader035/viewer/2022072005/56649cec5503460f949b8dee/html5/thumbnails/13.jpg)
Questions?