Realistic Synthetic Generation Allows Secure Development
-
Upload
mapr-technologies -
Category
Technology
-
view
418 -
download
0
Transcript of Realistic Synthetic Generation Allows Secure Development
![Page 1: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/1.jpg)
© 2014 MapR Technologies 1© 2014 MapR Technologies
Realistic Synthetic Data Allows Secure Development
Ted Dunning
June 11, 2015
![Page 2: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/2.jpg)
© 2014 MapR Technologies 2
Who am I?
Ted Dunning, Chief Applications Architect MapR Technologies
Email [email protected] [email protected]
Twitter @Ted_Dunning
![Page 3: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/3.jpg)
© 2014 MapR Technologies 3
Short Books by Ted Dunning & Ellen Friedman
• Published by O’Reilly in 2014 and 2015• For sale from Amazon or O’Reilly• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-world-hadoop
http://bit.ly/mapr-tsdb-ebook
http://bit.ly/ebook-anomaly
http://bit.ly/recommendation-ebook
![Page 4: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/4.jpg)
© 2014 MapR Technologies 4
![Page 5: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/5.jpg)
© 2014 MapR Technologies 5
The basic idea
![Page 6: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/6.jpg)
© 2014 MapR Technologies 6
Anomaly Detection and Fraud Analytics
• Financial customer wants to identify zero-day attacks
• And advanced persistent threats
• By sophisticated adversaries who don’t use known vectors
• Must keep logs and other data secret– But must also collaborate on detection algorithms
![Page 7: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/7.jpg)
© 2014 MapR Technologies 7
Secure Development is Hard
![Page 8: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/8.jpg)
© 2014 MapR Technologies 8
Secure Development is Hard
Outside collaborators are outside the security perimeter
They can’t see the data and they can’t tune new algorithms to fit reality
![Page 9: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/9.jpg)
© 2014 MapR Technologies 9
How To Make Realistic Data
![Page 10: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/10.jpg)
© 2014 MapR Technologies 10
Parametric Simulation
Parametric matching of failure signatures allows emulation of complex data properties
Matching on KPI’s and failure modes guarantees practical fidelity
![Page 11: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/11.jpg)
© 2014 MapR Technologies 11
Do’s and Don’ts
• Do match the KPI’s and failure modes– Speed– Score distribution– False positive rates versus score
• Don’t try to match the actual data distribution precisely– Good enough is good enough and we want to imitate failures,
not create new life forms– Probably impossible to do precisely– Even if possible, it is vastly harder to match distributions
![Page 12: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/12.jpg)
© 2014 MapR Technologies 12
Methods for Generating Numbers
• Well-known distributions– Uniform, normal, gamma, Poisson– Truncations
• Cumulations– Random walk v1
• Mixture distributions• Hyper-parameters
– Random walk v2
![Page 13: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/13.jpg)
© 2014 MapR Technologies 13
Normal
data = data.frame(x=rnorm(10000), y=rnorm(10000))
![Page 14: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/14.jpg)
© 2014 MapR Technologies 14
Mixture of Normals
![Page 15: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/15.jpg)
© 2014 MapR Technologies 15
Random Walk
y = cumsum(rnorm(10000))
![Page 16: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/16.jpg)
© 2014 MapR Technologies 16
Pick Mean from Multinomial
![Page 17: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/17.jpg)
© 2014 MapR Technologies 17
Random Walk with Variable Standard Deviation
y = cumsum(rt(10000, df=0.9))
![Page 18: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/18.jpg)
© 2014 MapR Technologies 18
Methods for Generating Symbols
• Symbols are really just integers with a dictionary• Well-known distributions
– Multinomial– Dirichlet processes– Rich-get-richer, Pittman-Yor
• Mixture distributions• Hyper-parameters• Lookup tables!!!
– Simple tables– Data table joins for correlated components
![Page 19: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/19.jpg)
© 2014 MapR Technologies 19
Skewed Integers
207 3 203 0 198 7 196 4 195 12 193 10 189 2 187 1 185 13 179 6 178 9 177 5 177 25 174 21 173 8 173 14 170 18
[ {"name":"x", "class":"int", "skew":1}]
![Page 20: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/20.jpg)
© 2014 MapR Technologies 20
Methods for Generating Behaviors
• Use structured data!– Generate user meta-data– Generate list of transactions
• Only flatten if necessary• See Apache Drill for post-processing
![Page 21: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/21.jpg)
© 2014 MapR Technologies 21
Methods for Generating Databases
• Use integers (see previous) as foreign keys• Normalized form implies (approximate) independence of tables
![Page 22: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/22.jpg)
© 2014 MapR Technologies 22
![Page 23: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/23.jpg)
© 2014 MapR Technologies 23
Go get log-synth
https://github.com/tdunning/log-synth
![Page 24: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/24.jpg)
© 2014 MapR Technologies 24
A worked example...
![Page 25: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/25.jpg)
© 2014 MapR Technologies 25
Simulation Setup
![Page 26: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/26.jpg)
© 2014 MapR Technologies 26
![Page 27: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/27.jpg)
© 2014 MapR Technologies 27
![Page 28: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/28.jpg)
© 2014 MapR Technologies 28
Questions?
![Page 29: Realistic Synthetic Generation Allows Secure Development](https://reader036.fdocuments.in/reader036/viewer/2022062710/55b6c68dbb61ebd2768b4649/html5/thumbnails/29.jpg)
© 2014 MapR Technologies 29
Thank You
@mapr maprtech
[email protected]@apache.org
Ted Dunning, Chief Application Architect
MapRTechnologies
maprtech
mapr-technologies