How IOT, Analytics and ML unfolds in Storage Fabric · 2019 Storage Developer Conference India ©...

Post on 20-Jul-2020

2 views 0 download

Transcript of How IOT, Analytics and ML unfolds in Storage Fabric · 2019 Storage Developer Conference India ©...

2019 Storage Developer Conference India © All Rights Reserved. 1

How IOT, Analytics and ML unfolds in Storage Fabric

Sharath T SMicrochip

2019 Storage Developer Conference India © All Rights Reserved. 2

Our Map!

Know your sources

Collect the data

Prepare data

Machine Learning

ML based Storage Mgmt.

2019 Storage Developer Conference India © All Rights Reserved. 3

Our itinerary

r Effect of IoT to Data Centersr Effect of IoT in Data Centersr Collection of data from sourcesr Prepare datar Applying ML for different uses casesr Data visualization

2019 Storage Developer Conference India © All Rights Reserved. 4

Effect of IoT to Data Centers

2019 Storage Developer Conference India © All Rights Reserved. 5

Effect of IoT to Data Centers

r 26 B sensors by 2020 and 50 B connected devicesr 5G IoTr Edge computingr Detailed analysisr IoT impact on data-center management

2019 Storage Developer Conference India © All Rights Reserved. 6

Effect of IoT in Data CentersKnow your

sources

2019 Storage Developer Conference India © All Rights Reserved. 7

End points in Data Centers

r EMS (Environmental Monitoring Systems) & ASHRAE (American Society of Heating, Refrigeration, and Air-Conditioning Engineers)r Temperature (18 to 27 °C)r Humidity and water (RH 45% to 60%)r Air flow sensorsr Static electric sensorsr Server room and rack entryr Aisle conditions

2019 Storage Developer Conference India © All Rights Reserved. 8

End points in Data Centers

r Server r Storage controllerr Physical Drives (S.M.A.R.T) r Chassis

2019 Storage Developer Conference India © All Rights Reserved. 9

Collection of data from sources

Know your sources

Collect the data

2019 Storage Developer Conference India © All Rights Reserved. 10

What data to collect?

r SensorrTemperature, humidity, static electric charges,

intrusionr Storage

rSystem, Storage pools, Storage volumes, Drives and Chassis

2019 Storage Developer Conference India © All Rights Reserved. 11

Example Data Collection

r System Informationr CPU Utilizationr Network Utilizationr Memory Utilizationr OS detailsr Uptime

r Storage Poolr Statusr Interfacer Total Sizer Unused Sizer Spare Rebuild moder Volume countr Drive countr Type

r Storage Volumer Statusr Interfacer Total Sizer Unused Sizer Block Sizer RAID Levelr Drive countr Protected by Hot-

Sparer Write-cacher Read-cacher Acceleration method

r Driver Manufacturerr Typer Statusr Interfacer Total Sizer Unused Sizer Reserved Sizer Block Sizer Transfer speedr SMART statsr Bad Blocks

r Storage Controller Information

r Statusr Moder Interfacer Temperature

2019 Storage Developer Conference India © All Rights Reserved. 12

When to collect data?

r Periodic intervalrTime Series analysis and Forecasting

r Event BasedrUser initiated, System initiated

2019 Storage Developer Conference India © All Rights Reserved. 13

How to collect data?r Push mechanism

r Source / System generatedr Breach of any threshold valuesr Listener is required to read and store value

r Pull mechanismr Application / User requestedr On demand / Periodicr Programmatically

2019 Storage Developer Conference India © All Rights Reserved. 14

Prepare data

Know your sources

Collect the data

Prepare data

2019 Storage Developer Conference India © All Rights Reserved. 15

r Types of source datar Unstructuredr Semi-structuredr Structured

r On-Line Analytical Processing (OLAP) of prepared datar Cubesr Dimensions

Data

2019 Storage Developer Conference India © All Rights Reserved. 16

Introduction to Machine Learning

Know your sources

Collect the data

Prepare data

Machine Learning

2019 Storage Developer Conference India © All Rights Reserved. 17

Introduction to Machine Learningr Introduction r Flow

Collecting data

Preparing data

Training the model

Evaluating the model

Improve performanceEvolution

EvolutionDigital Journal

2019 Storage Developer Conference India © All Rights Reserved. 18

Data Feature Selection

2019 Storage Developer Conference India © All Rights Reserved. 19

Data Feature Selection

r Filter method

r Ex: Chi-Square, LDA, ANOVA

Set of all Features

Selecting the best subset

Learning Algorithm Performance

2019 Storage Developer Conference India © All Rights Reserved. 20

Data Feature Selection…

r Wrapper method

r Ex: Forward selection, Backward elimination, Recursive feature elimination, etc.

Set of all Features

Generate subset

Learning Algorithm Performance

Selecting best of Subset

2019 Storage Developer Conference India © All Rights Reserved. 21

Data Feature Selection…

r Embedded method

r Ex: LASSOS, Ridge Regression

Set of all Features

Generate subset

Learning Algorithm Performance

Selecting best of Subset

2019 Storage Developer Conference India © All Rights Reserved. 22

Apply ML for multiple use cases

Know your sources

Collect the data

Prepare data

Machine Learning

ML based Storage Mgmt.

2019 Storage Developer Conference India © All Rights Reserved. 23

Apply ML for multiple use cases…

r Case study - Data center managementr Drive failure predictionr Storage tiering suggestionr Storage usage trendr Storage requirement prediction

2019 Storage Developer Conference India © All Rights Reserved. 24

Drive Failure Prediction

2019 Storage Developer Conference India © All Rights Reserved. 25

Workflow

r What is S.M.A.R.T?r Data setr Periodic collectionr Selection features / attributes r Applying Support-vector-machine to predict drive failurer Outcome

2019 Storage Developer Conference India © All Rights Reserved. 26

Data set (Features)r List of S.M.A.R.T attributes

r Read Error Rater Reallocated Sectors

Countr Spin Retry Countr End-to-End errorr Temperaturer Command Timeoutr Reallocation Event Countr Uncorrectable Sector

Countr Soft ECC Correctionr G-Sense Error Rater Loaded Hours

q Sampling data (Periodic collection)Hours Temp1 . . . . . . ReadErr Servo4

2633 58 6 2944

2635 57 13 2688

2637 56 36 5189

2639 57 0 4032

2641 56 0 8384

. . . .

. . . .

2855 58 14 3322

2857 59 20 2624

2019 Storage Developer Conference India © All Rights Reserved. 27

Prepared Datar Feature selection

Attribute % Good % Failed

Temp1 11.8 48.2

Temp3 35.2 45.3

Temp4 8.8 59.2

Glist 0.5 8.8

ReadError1 0.4 0.8

WriteError 0.8 2.3

Reallocated sector 5.8 30.2

Uncorrectable sector 4.8 34.5

Spin-up time 5.2 14.2

Command timeout 6.2 29.8

2019 Storage Developer Conference India © All Rights Reserved. 28

Machine Learning Modelr Support-vector-machine (supervised learning)

r A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.

2019 Storage Developer Conference India © All Rights Reserved. 29

X X

Y Y

2019 Storage Developer Conference India © All Rights Reserved. 30

X

ZY Y

X X

2019 Storage Developer Conference India © All Rights Reserved. 31

X X X

Y Y Y

2019 Storage Developer Conference India © All Rights Reserved. 32

X

Y

X

Y

2019 Storage Developer Conference India © All Rights Reserved. 33

Machine Learningr Train the model (80%)

Good Drive

Failed DriveSVM Model

r Test the model (20%)

SVM Model

Unlabeled Drive

Performance

Tune parameters

2019 Storage Developer Conference India © All Rights Reserved. 34

Outcome

r Reduce false alarm of failurer Automated policy implementation

2019 Storage Developer Conference India © All Rights Reserved. 35

Storage Tiering Suggestion

2019 Storage Developer Conference India © All Rights Reserved. 36

Storage Tiering

r Physically partitioned into multiple distinct classes based on price, performance or other attributesr Swordfish – ClassOfService

r Data may be dynamically moved among classes within a tiered storage implementation

2019 Storage Developer Conference India © All Rights Reserved. 37

Storage Class (SNIA)r Media class

r High performance SSD/Cache

r High performance HDDr High capacity HDDr Tape

r Data classr Mission criticalr Hotr Cold

r Pricing classr Networked storager DASr Cloud

2019 Storage Developer Conference India © All Rights Reserved. 38

Feature Collectionr Data/Block access frequencyr Last accessedr Last modification timer Size of objectr Encryptionr Drive type (HDD, SSD)r Drive interface typer Drive temperaturer Caching informationr …

2019 Storage Developer Conference India © All Rights Reserved. 39

Machine Learning Model

r k-means (unsupervised learning)r clusteringr centroidr aggregation

2019 Storage Developer Conference India © All Rights Reserved. 40

Machine Learning

Block

Acce

ss fr

eq.

On Cache On SSD On HDD

BlockAc

cess

freq

.Block

Acce

ss fr

eq.

2019 Storage Developer Conference India © All Rights Reserved. 41

Storage Usage Trendr How Storage is used in a data center over a time period.r Time Series Algorithmsr Ex of trends

r Which class of service is more used in futurer Which media class will be more used in future

2019 Storage Developer Conference India © All Rights Reserved. 42

Data Visualization

Know your sources

Collect the data

Prepare data

Machine Learning

ML based Storage Mgmt.

2019 Storage Developer Conference India © All Rights Reserved. 43

Data Visualizationr Tools available

r Power BI, Tableau, Data Dog – commercial r Dash (personal favorite!) – opensource

r Dashr Pure python based frameworkr Abstracts away all technology and protocolr Ideal for building data visualization appsr https://plot.ly/products/dash/

2019 Storage Developer Conference India © All Rights Reserved. 44

An image of Halley's Comet taken in 1986. (Image: © NASA)

2019 Storage Developer Conference India © All Rights Reserved. 45

Fraunhofer lines, in astronomical spectroscopy

2019 Storage Developer Conference India © All Rights Reserved. 46

Thank You!sharath.ts@microchip.com

https://www.linkedin.com/in/sharath-ts-5720a520