On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types

22
On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types Presenters: Enkh-Amgalan Baatarjav Kalyan Pathapati Subbu Satyajeet Nimgaonkar SFI Workshop on Adaptive and Resilient Computing Security Stephen Bush and Todd Hughes

description

Stephen Bush and Todd Hughes. On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types. SFI Workshop on Adaptive and Resilient Computing Security. Presenters: Enkh-Amgalan Baatarjav Kalyan Pathapati Subbu Satyajeet Nimgaonkar. Overview. Introduction - PowerPoint PPT Presentation

Transcript of On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types

Page 1: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types

Presenters:Enkh-Amgalan BaatarjavKalyan Pathapati SubbuSatyajeet Nimgaonkar

SFI Workshop on Adaptive and Resilient Computing Security

Stephen Bush and Todd Hughes

Page 2: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

OverviewIntroduction

Innovation and securityChallenges

Detecting variation in the complexity landscapeSemantic Type ClassificationFramework and Experimental Test SetDiscrimination Results ConclusionReferences

Page 3: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

IntroductionA problem in information system is information assuranceMain idea: Complexity based vulnerability analysis

Applying Kolmogorov Complexity for estimating and predicting previously unknown vulnerability

Progress on experimental validation of vulnerability analysis frameworkKolmogorov Complexity Video

Page 4: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

IntroductionThe salient point of complexity-based vulnerability analysis

The better one understands a phenomenon, the more concisely the phenomenon can be described.Goal of science: to develop theories that require minimum size to be fully described

The objective of this paperTo find whether estimates of complexity can be used to differentiate known types of data based on their complexity

Page 5: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Intro: BenefitMotivating early works: active network java complexity probe toolkit.Tools based on Kolmogorov complexity do not require detailed a priori information about known attacks, but rather compute vulnerability based upon an inherent, underlying property of information itself, namely, its Kolmogorov-Chaitin complexity.

Page 6: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Intro: Innovation and Security

A method for vulnerability identification1. Waiting for an information system to be

attacked2. Surviving the attack3. Detecting the attack4. Analyzing the attack5. Adding result into a knowledge base

Attackers and defenders of information system are capable of innovation

Page 7: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Intro: ChallengesLength of time required to obtains an accurate sample (performing the analysis in real-time)Stream of data on a network link can be sampled at multiple protocol layers.

OSI Model: physical, data link, network, transportation, session, presentation, application

Potential attackers target areas of low complexity and high complexity

Low complexity: easier to observe and understandHigh complexity: potentially a good place to hide activities

Page 8: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Intro: Challenges

Page 9: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Detecting variation in complexity landscapeFor complexity map generation

Complexity landscape has sufficient variation

Smallest descriptive length of different semantic types

Equal or vastly differApproximation of smallest descriptive length

Best descriptorNo redundant informationUnique essence of entity remains

Goal: Maximize discrimination Smallest representation of a sequence

Page 10: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Semantic Type Classification • An input stream

• Different kinds of information• Arrives into the complexity probe classifier

• The classifier • Kolmogorov Complexity estimate of the input

stream • to categorize incoming data into different

semantic types.• Audio, MS Word Document, Executable,

Image, ASCII Text, or Video

Page 11: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Framework and Experimental Test Set

• Ten randomly chosen samples of each type of data• Data filtered to extract header• The complexity estimator

• returns an estimate of its complexity.• Mapper determines a semantic type

• based upon the complexity estimate.

Page 12: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Complexity Estimator Module

Estimation using bit streamssimple entropy estimator (H)Limpel-Zev (LZ) compression, Zip (Zip) compression, bZip (bZip) compression, and a frequency-based FFT estimator technique (Psi).

Page 13: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Tunable parameters of the Complexity Probe

Parameters:

specification of filters, sampling rate, window size, and the set of estimator algorithms enabled.

The output a single semantic type to identify a .file a vector of semantic types, one for each window

Page 14: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Discrimination ResultsDiscriminate analysis

Zip estimator

Squared distance between semantic types rrelatively large except in the case of the distances circled in red.

These types – very close to one another yield a high error rate in discriminating among these types.

Page 15: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Accuracy of thecomplexity-based systemThe histogram columns represent the percent of data from the experimental test set correctly classifiedCombination of entropy types audio and executables as a combined typeMS Word and text as a combined typeImages and video as combined types

Page 16: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Timing ProfileFor a complexity estimator, the actual complexity of the data and the window size will have greatest effects on timing.Fig. shows the mean complexity for each estimator for the entire experimental test set.

Page 17: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Time (ms) vs. Window Size (bytes)The fig. shows the expected amount of time for each semantic type as a function of window size.In every case, a larger window size requires more time to estimate complexity.

Page 18: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Time (ms) vs. Complexity (10Video files)The fig. shows the expected amount of time for each semantic type as a function of complexity of the sequence in the window.Time to estimate decreases with increase in complexity.

Page 19: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Time vs. % Correct Discrimination

Accuracy vs. Time Discrimination vs. Compression Ratio

Page 20: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

Throughput (b/ms) per Semantic Type

Throughput for Z & H/semantic Type

Throughput for Psi, LZ & BZ/semantic Type

Page 21: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

ConclusionResults in this paper analyze whether estimates of complexity have their required resolution to differentiate known types of data based upon their complexity.Results indicates data types can be identified by estimates of their complexityA map of complexity can identify suspicious types

Executable data embedded within passive data types

Page 22: On The Effectiveness of  Kolmogorov  Complexity Estimation to Discriminate Semantic Types

ReferencesOn The Effectiveness of Kolmogorov. Complexity Estimation to Discriminate. Semantic Types. Stephen F. Bush, Senior Member, IEEEComplexity as a Framework for Prediction, Optimization, and Assurance, Proceedings of the 2002 DARPA Active Networks Conference and Exposition (DANCE 2002), IEEE Computer Society Press, pp. 534-553, ISBN 0-7695-1564-9, May 29-30, 2002, San Francisco, California, USA.Bush, Stephen F., Extended Abstract: Complexity and Vulnerability Analysis, Complexity and Inference, June 2-5, 2003, DIMACS Center, Rutgers University, Piscataway, NJ, Organizers: Mark Hansen, Paul Vitányi, Bin Yu.Kirchher W., Li M., and Vitányi P., The Miraculous Universal Distribution. The Mathematical Intelligencer, Springer-Verlag, New York, Vol. 19, No. 4, 1997.Ming Li and Paul Vitányi. Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1993. ISBN 0-387-94053-7.