Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating...

36
1 Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic Middleware System Robert C Broeckelmann Jr. 29 Nov 2007

Transcript of Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating...

Page 1: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

1

Masters Project DefenseInvestigating Techniques For Identifying Thread Behavior and Evaluating

Alternative Automatic Classification Methods in a Realistic Middleware System

Robert C Broeckelmann Jr.29 Nov 2007

Page 2: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

2

Where We Started…

• Began meeting with Dr. Gill in September, 2006. • Officially, started working in January, 2007.• Explored how an OS Scheduler could be extended to

determine, before a scheduling decision, if – a thread is displaying an undesirable behavior– operating outside of a predefined range

• Can information available to an IDS system be fed to an OS

scheduler?[22,23,24,25,26]

Page 3: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

3

Where We Went…

• What other information is available to make such decisions?

• How do we gather & process this information?• How Do You classify the High-Level Function

of Threads based upon this data?• Practical use in Industry.

Page 4: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

4

Original Test Environment• Spent Spring Semester building a test environment.

– Several dead ends.

• Original Test Environment consisted of:– VMWare Workstation 5.x[10]– Fedora Core 6[11]– Custom build of KURT Linux 2.6.18,.19,.20/STREAMs[12]– Custom Linux Kernel build 2.6.18[13]– Java 1.5[1,3,5,8,9]– Strace[14]

• KURT Linux incapable of capturing System Calls per lwp out-of-the-box. • Explored using strace on Linux—problems with high-thread counts.

Page 5: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

5

Final Test Environment• Final Test environment

– 2 Dell PCs(2 CPU, 2GB memory)– OpenSolaris 2.11[28]– Java 1.6[4,5,6,8,9]– JBoss 3.2.8b[15]– MySQL v5.0 Community Server[29]– JMeter v2.2[31]– Java PetStore eCommerce application (J2EE Spec v1.3).[32]– PetStore configuration adaption for JBoss [33]

• Had to move to Java 1.6.0 that ships with OpenSolaris 2.11 in order to utilize plug points with DTrace

• This Masters Project completed using almost entirely Open Source tools.– Note, OpenSolaris, DTrace released under the OpenSolaris Binary

License & CDDL (OSI approved) license[34,35].– Java 1.6 is not Open Source. JDK 1.7 will be Open Source[36].

Page 6: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

6

What information is available?

• Available Information– System Calls[41]– File Descriptor, I/O SysCall patterns[42]– CPU utilization • Traditional (User, Kernel, Idle, I/O Wait)[30]• Micro-State Accounting information[30]

• Other information is available, limiting scope.• Must be gathered with minimal overhead.

Page 7: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

7

Gathering Information

• Each type of data has a tool of choice– System Calls -> DTrace/DTruss[7,27,37,38]– Traditional CPU Utilization-> vmstat,

prstat[40,43,44]– Micro-State Accounting -> prstat[36,40]

• This project focuses on the use of System Call sequences (broken down per thread).

Page 8: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

8

Practical Uses

• The techniques developed here could have a practical use in industry.

• For example, a System Administrator or Performance Engineer managing/monitoring a complex J2EE installation.– Such as BEA Weblogic, IBM Websphere, or

Redhat JBoss[45,46,47]– Similar, multithreaded-middleware environment

Page 9: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

9

Our Approach• Original Goal: build an OS Scheduler with an ability to distinguish between

a thread whose behavior is desirable and one that is undesirable.• Chose to focus on a prerequisite.

– What data do we gather?– Techniques for gathering and processing that data.

• Focused on the classification of threads within a multithreaded process by data that can be gathered – on a per-thread basis at run-time. – efficiently

• First step towards building this enhanced OS Scheduler.

Page 10: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

10

Previous Work• Work in the areas of OS Security research and IDS systems

has used system call analysis heavily[19,20,21,48,49,43,42].– SubDomainTM:Parsimonious Server Security– Improving Host Security with System Call Policies– Traps and Pitfalls: Practical Problems in System Call Interposition

Based Security Tools– A Secure Environment for Untrusted Helper Applications Confining

the Wily Hacker– Ostia: A Delegating Architecture for Secure System Call Interposition

Page 11: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

11

Classifying Thread Functions

• Using System Call information, I created a method to visually classify a thread’s function.

• Experimented with different machine-learning algorithms to try to accurately predict thread function.

Page 12: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

12

The Method For Classifying Threads

• Basis for a new method that can be used to classify threads and determine if they are behaving correctly.

• Produces a visual finger print of a thread’s behavior.

• Produces a representation of run-time characteristics that would otherwise be difficult to analyze, visualize, & bring together.

Page 13: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

13

Subject Of The Method – Modern Middleware

• Modern (especially Java-based)middleware involves one or more processes with many threads & moving pieces.

• Capturing the behavior of a single thread or interaction between the constituent pieces can be challenging.

• We used JBoss 3.2.8SP1 as a representative piece of modern middleware for this project.

Page 14: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

14

JBoss Internals

• JMX Micro-Kernel Architecture– All the major J2EE subsystems are JMX beans.

• Jboss 3.x fully supports J2EE 1.3 Spec.– Used for 3.2.8SP1 maturity and the Java PetStore

application version used.– For more information, see [2].

• Note, Jboss 5.x is a complete architectural redesign.

Page 15: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

15

High-Level Classification Of Threads In A JBoss J2EE Container

Page 16: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

16

Data Gathering & Processing

• Tools used to gather data during a load test– DTrace[7]– DTrace Toolkit[37]– Dtruss[38]– Bash Shell Scripting[39]– GNU Tools[18]– Prstat[40]

• Tools used to process data after a load test– GNUPlot[17]– Bash Shell & other GNU tools[18]– RapidMiner[16]

Page 17: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

17

Thread groups I am studying.

• Thread Groups(collection of threads that perform similar functions)– HTTP Processor– JMS Thread(3)– JMS Session Workers– Connection Consumer– JBoss MQ Cache Reference Softener– Scanner Thread– Young GC Threads– Old Gen GC Thread– JIT Compiler– HSQLDB Timer– TimeOut Factory Thread

• Why were the other threads left out?– Couldn’t capture thread type via a Java Thread Dump.– Insufficient number of System Calls made by thread

during load test.

Page 18: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

18

Result 1 – SysCall Graphs

• Hypothesis:– We can use OS data (such as system call usage) to

build a graphical representation (histogram) that uniquely identifies each type of thread (Thread Group).

• Result:– For many thread types, yes. Classifying system call

sequences using Thread Dumps shows that there is an identifiable pattern of System Calls in many thread types.

Page 19: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

19

Data Processing—SysCall Graphs• Split into individual threads.• Replace system call names with #'s• Produce frequency counts• Build GNUPlot files• Generate PNG images• Generate HTML page

Page 20: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

20

How To Map Threads & Graphs

• The Sun JVM has the ability to pause all threads to print for each– full call stack– thread description– native lwpid.

• Several thread dumps were captured during load tests.

• Matched LWPIDs to NIDs(Native IDs) in Thread Dump.

Page 21: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

21

Graph Format

• 3-Dimensional– X – Time*– Y – System Call Type– Z – Frequency

• *Relative time-frames are not represented.

Page 22: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

22

Cache Reference Softener/Connection Consumer/Young GC ThreadHSQLDB Timer/HTTP Processor/TimeOutFactory

Page 23: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

23

JIT Compiler/JMS Thread(3)/Session WorkerOld GC Thread/Scanner Thread

Page 24: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

24

Results

• Using these results we are able to categorize many of the threads with the SysCall Graphs.

• From there, we were able to compare SysCall Graphs within a single run and between different runs.

• Visually-recognizable pattern for each of the Thread Types that we are looking at.– This pattern holds for threads of the same type in each

run.– This pattern holds for threads of the same type in different

runs.

Page 25: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

25

Comparisons between Runs:Connection Consumers/JMS Session Workers/

HTTP Processor

Page 26: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

26

Statistical Analysis of Data

• Tried Nearest Neighbor on the actual sequence using Euclidean & Nominal measures—Unsuccessful.– Different length sequences

• Experimented with Hyper Planes—Unsuccessful.• Experimented with 1st Order Markov Chains—Unsuccessful.• Tried NN on SysCall counts of a thread using Euclidean

Measure.– Greatest success– Not perfect

Page 27: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

27

Result 2 – Nearest Neighbor

• Hypothesis 2:– We can apply machine learning techniques to

predict the different thread types using the data we have gathered.

• Result:– Using Nearest Neighbor on the system call counts

we can partially do this.

Page 28: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

28

Data Processing(Result 2)

• RapidMiner Data Files[16]– Define an ARFF model definition file– Define an AML test data definition file.– Put test data into a space-delimited file.– Define Nearest Neighbor XML file• Produces a RapidMiner model file.

– Define ModelLoader XML file.• Loads a model file and test data. • Forms predictions regarding test data.

– Produces a data file that lists predictions and confidence values for each row in data file.

Page 29: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

29

Results

Thread Type

# of threads

Run1 Correct Run2 Correct

Run3 Correct

Run4 Correct

Run5 Correct

Cache Reference Softener 1 1 1 1 1 1

Connection Consumer 6 0 0 0 0 0

HSQLDB Timer 1 0 1 1 1 1

HTTP Processor 9 8 8 8 8 8

JIT Compiler 2 2 2 2 2 2

JMS Thread(3) 10 10 10 10 10 10

Old GC Thread 1 0 1 1 1 1

Scanner Thread 1 0 1 1 1 1

Session Worker 15 6 15 15 15 15

Timeout Facctory 1 0 0 0 0 1

Young GC Thread 2 2 2 2 2 2

Page 30: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

30

What It Did NOT Accurately Predict

• No Connection Consumer threads accurately predicted with Nearest Neighbor.– System Call counts very similar to other threads.

• One HTTP Processor thread mispredicted.– This thread handled very little traffic. As a result

its system call counts were significantly different.• Shows shortcomings of Nearest Neighbor (Euclidean

Distance Measure) algorithm for our purposes.

Page 31: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

31

Future Directions

• Rth-Level Markov Chain modeling of system call sequences to accurately predict Thread Functions[48,54].

• Using Micro-State Accounting data to fingerprint/predict thread types[36].

Page 32: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

32

Questions?

• Thank you.

Page 33: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

33

Reference1. JavaTM 2 Platform Standard Edition 5.0 API Specification. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <

http://java.sun.com/j2se/1.5.0/docs/api/>2. Research Project: An Analysis of JBoss Architecture. Liu, Jenny. 29 Apr. 2002. School of Information Technologies, University of Sydney. 13 Jan. 2007

<http://www.huihoo.org/jboss/jboss.html.>3. JDKTM 5.0 Documentation. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/>4. Java™ Platform, Standard Edition 6 API Specification. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/api/>5. HotSpot Runtime Overview. OpenJDK Project. 15 Apr 2007. <https://openjdk.dev.java.net/hotspot/docs/RuntimeOverview.html>6. JDKTM 6 Documentation. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/>7. OpenSolaris Community: Dtrace. OpenSolaris. Sun Microsystems Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/>8. The Java Language Specification, Third Edition. 1 Jan 2005. Gosling, James. Joy, Bill. Steele, Guy. 13 Jan. 2007

http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html9. The JavaTM Virtual Machine Specification Second Edition. 1999. Lindholm, Tim. Yellin, Frank. 13 Jan. 2007 <

http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html>10. Workstation 5 User’s Manual. 16 Sept. 2005. Vmware, Inc. 13 Jan. 2007. < http://www.vmware.com/pdf/ws5_manual.pdf >11. Fedora Project – Fedora Core 6. 22 Oct 2006. RedHat, Inc. 13 Jan. 2007. <http://www.fedoraproject.org/>12. KU System Programming. The University of Kansas. 13 Jan. 2007 <http://wiki.ittc.ku.edu/kusp_wiki/index.php/Main_Page>13. The Linux Kernel Archive. 12 Jan 2007. Linux Kernel Organization, Inc. 13 Jan 2007 <http://www.kernel.org/>14. Strace Project. 13 Jan 2007. Strace Project <http://sourceforge.net/projects/strace/>15. JBoss Admin Development Guide. 2004. JBoss, Inc. 13 Jan 2007. <http://docs.jboss.org/jbossas/admindevel326/html/>

Page 34: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

34

Reference16. Mierswa, I. and Wurst, M. and Klinkenberg, R. and Scholz, M. and Euler, T., Yale (now: RapidMiner): Rapid Prototyping for Complex Data Mining

Tasks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2006), 2006.17. gnuplot homepage. 15 Apr 2007. Williams, Thomas. Kelley, Colin.<http://www.gnuplot.info>18. The GNU Operating system - the GNU project - Free Software Foundation - Free as in Freedom - GNU/Linux. 15 Apr 2007. Free Software

Foundation. <http://www.gnu.org>19. Design and Performance of Configurable Endsystem Scheduling Mechnaisms20. The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial Interference.21. Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks.22. SubDomainTM:Parsimonious Server Security23. Improving Host Security with System Call Policies24. Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools25. A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker26. Ostia: A Delegating Architecture for Secure System Call Interposition27. Solaris Dynamic Tracing Guide. 5 Sep. 2005. Sun Microsystems, Inc. 1 Apr 2007 http://docs.sun.com/app/docs/doc/817-6223 OpenSolaris v2.1128. Home at OpenSolaris.org. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/>29. MySQL 5.0 Reference Manual. MySQL AB. 1 Apr 2007. <http://dev.mysql.com/doc/refman/5.0/en/manual-info.html>30. Solaris Internals CPU/Processor. 15 July 2007. Solaris Internals. 1 Nov. 2007. <http://www.solarisinternals.com/wiki/index.php/CPU/Processor>

Page 35: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

35

Reference31. JMeter: Users Manual. 1 Jun. 2006. Apache Jakarta Project. 15 Apr 2007 < http://jakarta.apache.org/jmeter/usermanual/intro.html />32. Java Pet Store Demo 1.3.2. 4 Aug. 2003. Sun Microsystems, Inc. 13 Jan 2007 <http://java.sun.com/blueprints/code/jps132/docs/index.html>33. Java Petstore Tutorial. MobileFish. 13 Jan 2007 <

http://www.mobilefish.com/tutorials/petstore_1_3_2/petstore_1_3_2_quickguide_jbossmysql.html>34. OpenSolaris Binary License. 4 Nov. 2005. Sun MicroSystems. 1 Apr. 2007. <http://opensolaris.org/os/licensing/opensolaris_binary_license/>35. COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL). 24 Jan 2004. Sun Microsystems, Inc. 15 Apr. 2007

<http://www.sun.com/cddl/cddl.html>36 The GNU General Public License, Version 2. 1 Jun. 1991. Free Software Foundation. 1 Nov 2007

<http://www.fsf.org/licensing/licenses/info/GPLv2.html>37 OpenSolaris Community: Dtrace. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace>38 DTraceToolkit at OpenSolaris.org. 1 Jun 2005. Sun Microsystems, Inc. 1 Apr 2007

<http://www.opensolaris.org/os/community/dtrace/dtracetoolkit>39 Bash Reference Manual. 15 Jul. 2002. Free Software Foundation. 1 Apr 2007 <http://www.gnu.org/software/bash/manual/bashref.html>40 prstat(1M). 4 Jan. 2001. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-0211/6m6nc673u?a=view>41. man pages section 2: System Calls. 4 Oct 2005. Sun Microsystems, Inc. 1 Apr 2007. <http://docs.sun.com/app/docs/doc/816-5167?l=en>42. S. Zanero, Unsupervised Learning Algorithms for Intrusion Detection, Ph.D. Thesis, DEI Politecnico di Milano, 200643. The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial interference44. vmstat(1M). 20 Dec. 2004. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-5166/6mbb1kqjv?a=view>45. BEA Weblogic Server 10.0. 13 Dec. 2006. BEA Systems, Inc. 1 Nov 2007 <http://edocs.bea.com/wls/docs100/index.html>

Page 36: Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic.

36

Reference46. WebSphere Application Server documentation. 29 May 2006. IBM Inc. 1 Nov 2007

<http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/welcome_base.html>

47. JBoss.org: Community Documentation. 2004. Redhat, Inc. 13 Jan 2007 <http://labs.jboss.com/projects/docs/>48. Markov Chain paper49. Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks