Anticipating Cyber Vulnerability Exploits Using Machine ... · The CVSS parameters are categorical...

www.recordedfuture.com|@RecordedFuture

Every day about 20 new cyber vulnerabilities are released and reported, and these are related to various software implementation weaknesses. Hackers exploit these vulnerabilities to launch an attack, trigger a system failure, access sensitive information, or gain remote system access. Some vulnerabilities have a severe impact, while hackers show little or no interest in exploiting others. For an information security manager, it can be a daunting task to keep up and assess which vulnerabilities to prioritize for patching. The cyber threat landscape is quickly changing and it is of vital importance for many companies to stay updated and proactively work to improve security. Many of the vulnerabilities are zero-days, which means that the vulnerability was exploited before the software vendor was aware of its existence. 90% of the exploits are generally available within a week from the vulnerability disclosure, a great majority within days. Making an early assessment automatically can therefore help security managers discover possible threats in advance.

We use machine learning (ML) and data mining to examine correlations in vulnerability data to see if some vulnerability types are more likely to be exploited. With ML algorithms it is possible to binary classify vulnerabilities as likely to get exploited or not. Our main data sources are the National Vulnerability Database (NVD) and the Exploit DB (EDB).

We found:

› Cyber exploits can be anticipated with an accuracy of 83% using open vulnerability data.

› The text of the CVE summary provided the most important features. In fact, the Common Vulnerability Scoring System (CVSS) scores, parameters and Common Weakness Enumeration (CWE) numbers did not add any additional information to the model and to the prediction.

› Hackers are likely to go after content management systems (CMSs). That may be because the fragmented world of CMS provides a target-rich environment of unpatched websites.

THREAT INTELLIGENCE REPORT

Anticipating Cyber Vulnerability Exploits Using Machine Learning

Machine Learning

Machine Learning is a field in computer science focusing on teaching machines to see patterns in data. It is often used to build predictive models for classification, clustering, ranking, or recommender systems in e-commerce. In supervised machine learning there is generally some dataset, a feature matrix X of observations seen in Figure 1, with known truth labels y that an algorithm should be trained to predict.

The simplest case of classification is binary classification. For example, based on some weather data for a location, will it rain tomorrow? Based on previously known data X, with a boolean label vector y (yi ∈ {0, 1}), a classification algorithm can be taught to see common patterns that lead to rain. It is also possible to use the same algorithms to predict the risk of rain as a probability between 0 and 100%.

Figure 1

The vulnerability data is represented mathematically as a feature matrix X and a label vector y.

http://www.recordedfuture.com

https://twitter.com/recordedfuture/

http://en.wikipedia.org/wiki/Zero-day_(computing)

https://nvd.nist.gov/

https://www.exploit-db.com/

https://www.first.org/cvss

https://cwe.mitre.org/

http://en.wikipedia.org/wiki/Machine_learning

2


Recorded Future

We use common ML classifier algorithms, such as Support Vector Machines (SVM), Naive Bayes, and Random Forests. To measure the performance of our classifiers, we use common evaluation metrics such as precision and recall, and f-score.

Vulnerability Data

To classify vulnerabilities, the data must be converted into a mathematical model, a feature matrix, to be used with the ML algorithms. In the feature matrix, each row is a vulnerability and each column is a feature dimension. The NVD includes information for all Common Vulnerabilities and Exposures (CVEs). As of May 2015, there are close to 69,000 CVEs in the database. The NVD includes CVSS metrics and parameters for each CVE, see Figure 2. The CVSS score ranges from 0-10, and is an official severity measurement, with 10 being critical vulnerabilities. The CVSS parameters are categorical features such as Access Vector, Access Complexity, Authentication, Confidentiality, Integrity, and Availability. An example of a CVE summary for the vulnerability ShellShock can be seen below.

Linked to the CVEs in the NVD are 1.9M Common Platform Enumeration (CPE) strings, which define different affected software combinations, and contain a software type, vendor, product, and version information. A single CVE can affect several products, from several different vendors. To make more sense of this data, the 1.9M entries were queried for distinct combinations of vendor products per CVE, ignoring version numbers. This reduced the data to 30,000 different combinations of CVE-number, vendor, and product. About 20,000 of those were unique vendor products that only exist for a single CVE. Some 1,000 of those were vendor products with 10 or more CVEs.

The NVD also contains 400,000 links to various external pages with more information. There are many links to exploit databases such as exploit-db.com (EDB), milw0rm.com, rapid7.com/db/modules (a.k.a. metasploit), and 1337day.com. In order not to build the answer into the feature matrix, references to exploit databases were not used. Many links also go to bug trackers, forums, security companies and actors, and other databases. The external links were used as features using extracted domain names. To filter out noise, domains occurring less than 5 times were discarded, which resulted in 1649 domain names.

Figure 2

CVE information from the NVD, such as a free text summary, CVSS parameters, and CWE-number, can be explored at cvedetails.com for the vulnerability Shellshock.

http://en.wikipedia.org/wiki/Support_vector_machine

http://en.wikipedia.org/wiki/Naive_Bayes_classifier

http://en.wikipedia.org/wiki/Random_forest

http://en.wikipedia.org/wiki/Precision_and_recall

http://en.wikipedia.org/wiki/F1_score

http://en.wikipedia.org/wiki/Shellshock_(software_bug)

https://cpe.mitre.org/

https://www.exploit-db.com/

http://en.wikipedia.org/wiki/Milw0rm

http://www.rapid7.com/db/modules/

http://1337day.com/

http://www.cvedetails.com/cve/CVE-2014-6271/

3


Recorded Future

In addition, the NVD includes CWE numbers, which categorizes vulnerabilities into categories such as a cross-site-scripting or OS command injection. In total, there are around 1,000 different CWE-numbers, but only 29 distinct were found to be in use in the NVD.

Each CVE also has a free text description summary, which can be vectorized as a bag-of-words model, and converted into a set of occurrence count features.

There is no parameter to be found in the information from the NVD about whether a vulnerability has been exploited or not. A vulnerability was considered to have an exploit if it had some known exploit in the EDB. The EDB holds information and proof-of-concept code for 32,000 common exploits, including exploits that do not have official CVE-numbers.

Results

For our binary classification the challenge is to find some algorithm and feature space to optimize the classification performance. A common theorem in the field of machine learning is the No Free Lunch theorem, which says that there is no universal algorithm or method that will always work. Thus, to get the proper behavior and results, a lot of different approaches might have to be tried. First we show how different kinds of features contribute to the classification performance. Secondly, we benchmark multiple machine learning algorithms to compare their relative performance. The number of vendor products used will be denoted nv, the number of words nw, and the number of references nr.

Selecting a good feature space

We further investigate how different types of features contribute and scale. We run these comparisons on a subset of CVEs from 2010-2014 containing 7,528 samples using the algorithms SVM Liblinear and Naive Bayes. We generally see that by increasing the number of common words, vendor products, and references it is possible to get a better classification accuracy. This is seen in Figure 3, Figure 4, and Figure 5 respectively.

By adding common words as features it is possible to discover patterns that are not captured by the simple CVSS and CWE parameters from the NVD. As more information is added, the information from the CVSS and CWE features matter less. The classification with just the base information performs poorly. By using just a few common words from the summaries it is possible to boost the accuracy significantly. It also shows that CVSS and CWE parameters are redundant when many words are used. Common words found with exploited vulnerabilities are often PHP-related, such as root_path, magic_quotes_gpc, and register_globals. CVEs with any of these words have more than 90% probability of being exploited.

Figure 3

When more common words are used, the classification gets better and the information from the CVSS and CWE parameters becomes redundant. There is a clear performance difference between the two algorithms SVM Liblinear and Naive Bayes.

http://en.wikipedia.org/wiki/No_free_lunch_theorem

4


Recorded Future

Adding more vendors gives a better classification, but the best results are some 10%-points worse than when words are used. When a lot of vendor information is used, the other parameters matter less, and the classification gets better. But the classifiers also risk overfitting and getting discriminative since many smaller vendor products only have a single vulnerability. We found that vulnerabilities for many CMSs have a high probability of being exploited. For example, vulnerabilities for the PHP frameworks Joomla, Bitwiewer, and PHP-Fusion all have more than an 85% probability of getting exploited.

Using only references shows the same pattern as when adding more vendor products, or words; when more features are added the classification accuracy quickly gets better over the first hundreds of features but then gradually gets saturated.

Final Binary Classification

A final run was done with a larger dataset, using 55,914 CVEs between the years 2005 through 2014. A feature matrix was set up with CWE and CVSS parameters, and nw = 10000, nv = 6000, nr = 1649. 80% of the data was used to train the algorithms, and 20% was used to test the algorithms.

Figure 4

When more vendor products are added, the classification gets better, however not close to the same performance as when common words are used.

Figure 5

Using common references as features also gives a performance boost. As more features are added all curves start to converge.

We arm you with real-time threat intelligence so you can proactively defend your organization against cyber attacks. With billions of indexed facts, and more added every day, our patented Web Intelligence Engine continuously analyzes the open Web to give you unmatched insight into emerging threats. Recorded Future helps protect four of the top five companies in the world.

About Recorded Future

Recorded Future, 363 Highland Avenue, Somerville, MA 02144 USA | © Recorded Future, Inc. All rights reserved. All trademarks remain property of their respective owners.

www.recordedfuture.com|REQUEST A DEMO

@RecordedFuture


The final result is shown in Table 1 below. Algorithm names are shown with each algorithm’s optimal parameter settings (these have been cross-validated as well, but we leave that out for simplicity). Performance metrics are reported for both the training and test set, with precision, recall, and f-score for the positive class. It is expected that the results are better for the training data, and some models like SVM with RBF kernel and Random Forest get excellent results on classifying data already seen. We are mainly interested in looking at the performance on the unseen test data. The results show that it is possible to get a mean prediction accuracy of some 83% using LibSVM RBF. Using Liblinear, it is possible to get an almost as good result, 820 times faster.

Table 1

Performance benchmark for the different ML algorithms for the final binary classification.

Conclusions

We have seen that it is possible to use machine learning to anticipate vulnerability exploits. Using several different ML algorithms, it is possible to get a prediction accuracy of 83% for the binary classification.

This research shows that the most important features are common words, references, and vendors. CVSS parameters, especially CVSS scores, and CWE-numbers are redundant when a large number of common words are used. The information in the CVSS parameters and CWE-number is often contained within the vulnerability description. We have also seen that hackers are likely to exploit vulnerabilities for popular CMSs.

There is still room for a lot of improvement. The accuracy of the vulnerability data is far from perfect, and the results could most likely be improved by using more accurate observations and labels. We conclude that it is possible to anticipate exploits for software vulnerabilities.

http://www.recordedfuture.com

https://twitter.com/recordedfuture/

Anticipating Cyber Vulnerability Exploits Using Machine ... · The CVSS parameters are categorical...

Documents

Transcript of Anticipating Cyber Vulnerability Exploits Using Machine ... · The CVSS parameters are categorical...