Design of Policy Engine to Prevent Malware Propagation in ... · DLMS/COSEM protocol TCP/UDP, PLC,...
Transcript of Design of Policy Engine to Prevent Malware Propagation in ... · DLMS/COSEM protocol TCP/UDP, PLC,...
Design of Policy Engine to Prevent Malware Propagation in Advanced Metering Infrastructure
Younghee Park
Computer Engineering Department
San Jose State University
Introduction
2
Introduction
• Advanced Metering Infrastructure (AMI)
Data Concentrator Unit (DCU) Headend
NAN (Neighborhood Area Network)
WAN (Wide Area Network)
WiMax, Cellular, PLC WiFi, PLC, Zigbee or Radio
Smart Meters
Utility Data Center - Demand Response Center - Customers Information Systems - Meter Data Management
HAN
HAN HAN
3
Introduction
• Goals of Advanced Metering Infrastructure (AMI) – Reliability
• Smart metering
• Automated real-time monitoring
– Efficiency • Demand response and control
• Active energy management
• Remote connect/disconnect
– Security • Improved by monitoring and reliability
4
Cyber Attacks in AMI
• Network availability attack – Overwhelm the communication and
computational resources
• Data attacks – Insert, alter or delete data in the network traffic
• Privacy attacks – Learn users’ private information by analyzing
electricity usage data
• Device attacks – Compromised or control a device
5
Cyber Attacks in AMI
• Malware (Malicious Software)
– Stuxnet, Auora worm in industrial systems
– Warning from Blackhat, McAfee
– Control systems and disrupt operations
– Launch malicious commands
• Blackout, remote disconnects
– Infect other systems in AMI
6
Policy Engine
• What is Policy Engine?
– A set of policy rules
– Detection/Prevention of Malware carried by an application layer protocol
• Malware propagation during communications
– Monitor bi-directional communications
• Client request (DCU, Headend)
• Server response (Smart meters)
7
Policy Engine DCU Smart Meter
DLMS/COSEM protocol
TCP/UDP, PLC, HDLC
8
DLMS/COSEM Protocol
• An standard application protocol for AMI
• DLMS (Device Language Message Specification)
– define objects found in meters
• COSEM (Companion Specification for Energy Metering)
– Define communication between meters and devices needing meter information
Meter (DLMS “Server”)
DCU (DLMS “Client”) Service Response
Service Request
9
DLMS/COSEM Protocol
• Interface classes (Class ID)
– All meter data as objects
– Object Identification system (OBIS)
• OBIS code is called a logical name (1st attribute value)
• [EX] Reading energy value
– Register (class_id : 3)
– Obis code is “1.1.1.8.0.255”
– Reading second attribute value
– 70 standard interface classes
Atrribute-1
Method-1
Atrribute-n
Method-n
Read/Write
Action
Object
10
Motivation for Policy Engine
• Compromised applications or devices
• Propagate malware by hiding itself into DLMS/COSEM traffic
– Violation of Packet formats
• Specification of DLMS/COSEM application layer
– No metering data
– Containing malware features
11
System Architecture of Policy Engine
12
System Architecture of Policy Engine
• Data collection system
– OBIS, Command types, system configurations
• Policy Rule Monitoring system
– Policy rules from the standard protocol
– Policy rules from two statistical features
• Logging system
– Recording malicious data
13
Standard Protocol Analysis
• DLMS/COSEM protocol
– It creates application data
• APDU (application protocol data unit)
– Policy rules
• Syntactic policy rules
• Semantic policy rules
• Access policy rules
• Communication policy rules
14
Standard Protocol Analysis
• DLMS/COSEM Protocol
– Syntactic/Semantic/Access/Communications
• Malware overwrite the APDU
15
Statistical Analysis
• Two features of malware – Packed or encrypted binaries are around 90%
– Executable format
• Smart meters – Low-computational power
– Limited bandwidth and resourses
• Statistical analysis tools – Entropy analysis
– N-gram analysis
16
Statistical Analysis
• Entropy analysis
– Bintropy
• Shannon entropy
• Detection method for packing and encrypted binaries
• Packed and encrypted binaries have greater than 6.67 entropy
– Metering data have low entropy less than 6.
• Evaluating the frequency of each byte in fixed block size
17
Statistical Analysis
• N-gram analysis
– FilePrint
• Each file type has its own distinct 1-gram distribution
– 1-gram distribution of metering data is different from the one of malware
– Measuring dissimilarity(distance) between two distributions
• Manhattan distance and Mahalanobis distance
18
Statistical Analysis • N-gram analysis
– Trained Model for malware
• K-mean clustering & Manhattan distance
• EX) 10 malware files, K=2 (2 Trained models)
– Pick 2 random files -> evaluate distributions -> update distribution by K-mean clustering and manhattan distance
– Testing Model for metering data
• Mahalanobis distance to compare to the trained model
• Distance(testing file, K-model) > threshold
19
…
Experiments
• Data samples – Trained model for 200 malware samples – Real 100 metering data files from the TCIPG
testbed
• Policy engine implementation – Open source GuruX for DLMS/COSEM – Intercept API calls of GuruX
• Experiments – Entropy & 1-gram distribution & false positive
rate & overhead
20
Entropy for Metering Data
Entropy of Packed Binaries (Bintropy)
Entropy of Encrypted Binaries (Bintropy)
21
1-gram Distribution
One malware sample
One metering data file
22
False Positive Rate
• Metering data are recognized as malware
• O.17% performance overhead
Truncation 50 Bytes 100 Bytes 200 Bytes 500 Bytes 1 KB 2KB
K=1 0% 0% 0% 0% 0% 1%
K=2 0% 0% 0% 0% 1% 1%
K=5 0% 0% 0% 0% 1% 1%
k=10 0% 0% 0% 0% 1% 1%
k=20 0% 0% 0% 1% 1% 2%
23
Conclusion
• Policy Engine
– Monitor DMLS/COSEM traffic to detect malware
– Malware must try to hide as data, but we can ask many policy rules that reveal its presence
– Use the standard protocol specification and the statistical features of metering data
24
Current/Future Work
• Improve Accuracy of Malware Detection
– ARM Instruction Distribution
– “e” code testing accepted for SmartGridComm 2013
• Performance evaluation on embedded systems
– Beagle board, Rasperry PI, etc.
25
Q & A
26
Thank You!
Standard Protocol Analysis
27
DLMS/COSEM Protocol
28
Interface Class
DLMS User Association, COSEM Identification System and Interface Classes, Eighth Edition
DLMS User Association 2007-09-14 DLMS UA 1000-1 ed.8 15/168
4. COSEM Interface Classes
4.1 Basic principles
4.1.1 General
This subclause describes the basic principles on which the COSEM interface classes are built. It also gives a short overview on how interface objects (instantiations of the interface classes) are used for communication purposes. Data collection systems and metering equipment from different vendors, following these specifications, can exchange data in an interoperable way.
Object modelling: for specification purposes this standard uses the technique of object modelling. An object is a collection of attributes and methods.
The information of an object is organized in attributes. They represent the characteristics of an object by means of attribute values. The value of an attribute may affect the behaviour of an object. The first attribute in any object is the “logical_name”. It is one part of the identification of the object. An object may offer a number of methods to either examine or modify the values of the attributes.
Objects that share common characteristics are generalized as an interface class with a class_id. Within a specific class, the common characteristi cs (attributes and methods) are described once for all objects. Instantiations of an interface class are called COSEM interface objects.
Manufacturers may add proprietary methods or attributes to any object; see 4.1.2.
Figure 2 illustrates these terms by means of an example:
Register class_id=3
logical_name: octet-string
value: instance specific
...
value = 57
Total Positive
Reactive Energy: Register
logical_name = [1 1 1 8 0 255]
value = 1483
Total Positive
Active Energy: Register
Class Methods Object Attribute Values
class identifier Attributes Instantiation
reset
logical_name = [1 1 3 8 0 255]
Figure 2 – An interface class and its instances
The interface class “Register” is formed by combining the features necessary to model the behaviour of a generic register (containing measured or static information) as seen from the client (central unit, hand held terminal). The contents of the register are identified by the attribute
© Copyright 1997-2007 DLMS User Association
DLMS User Association, COSEM Identification System and Interface Classes, Eighth Edition
DLMS User Association 2007-09-14 DLMS UA 1000-1 ed.8 29/168 © Copyright 1997-2007 DLMS User Association
4.2.3 Register (class_id: 3)
A “Register” object stores a process value or a status value with its associated unit. The register object knows the nature of the process value or of the status value. It is described by the attribute logical_name.
Register 0...n class_id = 3, version = 0
Attribute(s) Data type Min. Max. Def. Short name
1. logical_name (static) octet-string x
2. value (dyn.) CHOICE x + 0x08
3. scaler_unit (static) scal_unit_type x + 0x10
Specific methods (if required) m/o
1. reset (data) o x + 0x28
Attribute description
logical_name Identifies the “Register” object instance. Identifiers are specified in Clauses 4.4 and 5.
value Contains the current process or status value.
CHOICE {
--simple data types null-data [0], bit-string [4], double-long [5], double-long-unsigned [6], octet-string [9], visible-string [10], integer [15], long [16], unsigned [17], long-unsigned [18], long64 [20] long64-unsigned [21], float32 [23], float64 [24]
}
The data type of the value depends on the instantiation defined by “logical_name” and possibly on the choice of the manufacturer. It has to be chosen so that, together with the logical_name, an unambiguous interpretation of the value is possible.
When instead of a “Data” object a “Register” object is used, (with the scaler_unit attribute not used or with scaler = 0, unit = 255) then the data types allowed for the Value attribute of the “Data” interface class are allowed.
scaler_unit Provides information on the unit and the scaler of the value.
scal_unit_type: structure {
scaler, unit
}
scaler: integer This is the exponent (to the base of 10) of the multiplication factor. REMARK If the value is not numerical, then the scaler shall be set to 0.
unit: enum Enumeration defining the physical unit; for details see Table 3 below.
Method description
29
Example of Communication
30
Result of Bintropy
31
FilePrint
• 1-gram distributions
– 256 distinct bytes from 00000000 to 11111111
• Mahalanobis distance
• Manhatan distance
32
Modeling • The K-means algorithm that computes multiple
centroids is briefly described as follows: 1) Randomly pick K files from the training data set. These K files
(their byte value frequency distribution) are the initial seeds for the first K centroids representing a cluster.
2) For each remaining file in the training set, compute the Manhattan Distance against the K selected centroids, and assign that file to the closest seed centroid.
3) Update the centroid byte value distribution with the distribution of the assigned file.
4) Repeat step 2 and 3 for all remaining files, until the centroids stabilize without any further substantive change.
33