Vulnerability Detection Using Machine Learning

Artificial intelligence has grown rapidly in its influence of many products and services we take for granted today. Think about things such as self-driving vehicles, medical imaging, speech recognition, or even optimization of your playlist. AI is involved in all of those things, but it’s absent from one important area to cybersecurity and IT teams: enterprise vulnerability management.

Technology in the vulnerability management community has, to this point, evolved little since the field’s initial days. Given the abundance of rich and historic data, multi-dimensional risk elements, and a heretofore brute-force approach to remediation, the vulnerability management field appears ripe for AI exploitation.

Defining AI

Before undertaking a discussion of AI’s potential in vulnerability management, let’s first take a moment to define “artificial intelligence.” The phrase is used ubiquitously, but not always accurately.

AI is an umbrella term that encompasses several areas of advanced computer science, everything from speech recognition to natural language processing, to robotics, to symbolic and deep learning. AI technologists are constantly striving to automate seemingly “intelligent” behavior, or put differently, programming computers to do historically human tasks.

One AI component used extensively in many applications is machine learning: algorithms that leverage historical data to make predictions or decisions. The more ample the historical data, the higher the probability the prediction will be useful or accurate. As more historical data is gathered, the machine learning engine’s predictions improve, or in the vernacular of pop culture, the application “gets smarter.” For example, a machine learning-based application identifying the probability of lung cancer from an X-ray can make a prediction from a historical data set of 10 X-rays, but that prediction’s accuracy will be negligible. As the historical data set expands from 10 to 10,000, the prediction becomes more reliable, and will improve again as the data set grows to 20,000, 30,000, and beyond.

Machine learning-based applications use historical data to make future predictions that improve as time progresses, without human intervention. Machine learning and other AI techniques can be powerful weapons in the battle to counter today’s cyber bad actors, and there are many opportunities within the field of vulnerability management to use AI technology for better outcomes.

To date, the use of AI in vulnerability management has been largely inconsequential (there are some exceptions, such as some pioneering work in scanning automation using expert systems), but that is unlikely to be the case for much longer. There are a number of elements of the vulnerability management process that can benefit greatly from the appropriate application of AI techniques.

Developing a Meaningful Vulnerability Risk Score

Even though context-based vulnerability risk scoring is a later-stage step in the overall vulnerability management process, it’s the most important. All data gathered and used in the other vulnerability operations stages discussed in this paper contributes to the ultimate objective of modern vulnerability analysis: the context-based prioritization of each vulnerability to optimize remediation efforts and maximize risk reduction. Thus, we lead our discussion of AI in vulnerability management with intelligent vulnerability risk scoring.

Today, the phrase vulnerability “risk score” is largely synonymous with the risk score attached to each vulnerability in the CVE (Critical Vulnerabilities and Exposures) program. Although a useful starting point to assess the criticality of an individual vulnerability, it is a woefully inadequate measure of a vulnerability’s risk to an enterprise in and of itself. The primary deficiency of the CVE risk score is its lack of context. Olympic swimmer Michael Phelps was famous for eating 12,000 calories a day when training, including a daily breakfast of pancakes, French toast, fried egg sandwiches loaded with cheese, fried onions, and mayonnaise. Every doctor on the planet would – without context – assert that such a diet would be a recipe for heart disease and diabetes. But in context for an Olympic swimmer in training, such a diet’s risk to long-term health is negligible.

Similarly, a given vulnerability may earn a high CVE risk score. But on a particular network, the affected asset may be completely isolated on a secure subnet, not connected to the Internet, or on a device running no other services, therefore presenting very little risk to the business. Thus, to the individual organization, a high CVE risk vulnerability may – for that organization – be a much lower remediation priority, compared to vulnerabilities scored lower by the CVE scale, but expose the organization more significantly given their context.

Modern vulnerability management systems can leverage all the AI techniques discussed in this paper to collectively develop an in-depth understanding of the context of each asset. Once a sophisticated appreciation of the asset’s context is acquired, it can be combined with in-depth knowledge of the specific vulnerability and the external threat environment to generate a “context-driven priority.” Establishing priorities and a plan to reduce risk while optimizing limited remediation resources is the ultimate goal of an intelligent vulnerability management program, and the only way to realistically accomplish that objective is with a context-sensitive assessment of risk.

Discerning Vulnerability Exploitation Trends

Brand marketing professionals are using AI-based sentiment analysis applications to evaluate posts on social media platforms that reference their products. Collecting this data and applying AI to it can provide insight into how a brand is perceived in the market, and how that changes – for better or worse – over time. Similarly, cybersecurity chat boards, media sites, and other online sources of cybersecurity conversations can be collected and analyzed to predict which vulnerabilities are the most likely to be exploited, which security experts are most concerned about, and how those sentiments change over time.

Collecting and evaluating millions of such posts over time is impractical with human resources, but can be accomplished continuously with Neural Networks and Natural Language Processing (NLP) techniques – an AI technology that can discern meaning, positivity/negativity and, even more importantly, extract precise technical information from text. AI can interpret a large number of posts and blend their meanings to add context to any given vulnerability’s practical risk, a risk that can change quickly as new exploits are created and distributed among the ever-growing community of bad actors.

Performing Important Asset Detection

Finding all assets is the foundation of an effective vulnerability management program, especially those assets that may appear atypical in a given context. Using conventional detection mechanisms, given the sheer number of assets in a typical network, , it can be difficult to find network assets that are contextually out-of-the ordinary. For example, a server that hosts many websites or services, a workstation in a subnetwork full of servers, or a Linux server in a network of Windows machines with database services running. These kinds of assets should be considered particularly crucial, and as such deserve more attention from security teams.

Manually comparing assets can be a useful technique when searching for unique assets. Yet, the number of possible characteristics of an individual asset can be overwhelming, making it difficult to compare assets with a single dimensional analysis; a multi-dimensional approach is therefore required. To accomplish this, techniques from the field of Pattern Recognition – namely Novelty, Anomaly or Outlier Detection – can be employed to help uncover exceptional assets, or those that stray from the contextual “norm.”

One particular algorithm – Isolation Forest – can be particularly effective. In this process, several asset characteristics are compared using a multi-dimensional representation, and those that differ from contextual baselines are flagged (typically the top 10%). Using this filtering technique, the many “ordinary assets” are separated from the few “remarkable assets.”

Assessing the Reliability of Detections

An element of vulnerability management often unappreciated by those outside the field is the challenge of vulnerability detection. Determining whether an asset is configured such that it has an exploitable vulnerability can be more art than science, and the process is susceptible to a high frequency of false positives. AI can be employed in this part of the vulnerability management process to help reduce the number of false positives, essentially “detecting the misdetections.” Factors such as services running on the asset and the detection mechanism that flagged the vulnerability can be used to assess the probability that the identified vulnerability is, in fact, a legitimate one. And, as the experience of the AI system increases over time, its ability to accurately predict false positives versus legitimate vulnerabilities will improve.

To improve the reliability of vulnerability detection, Bayesian networks can be used when there is uncertainty in an assessment, in this case, whether an identified vulnerability is legitimate. The technique allows other observations to be included as evidence in the assessment, for example, how frequently does the detection mechanism being used generate false positives. Effectively, employing Bayesian networks allows a more intelligent analysis that balances imperfect scanning techniques with expert human knowledge.

Leveraging Industry Vulnerability Remediation Priority Data

All modern vulnerability management products today are either cloud-based or have a cloud-based component. Although there are myriad benefits to a cloud-based vulnerability management platform, one of the most valuable (yet typically underappreciated) is the user data that can be anonymized and culled from the application. Every organization is often remediating vulnerabilities on multiple assets daily. Multiply several daily remediation activities across dozens, hundreds or thousands of customers, and a cloud-based vulnerability management product has a rich data source on which to apply an AI engine.

Using this ever-changing and growing data source can reinforce or contradict conventional vulnerability remediation prioritization. Which assets are enterprises patching the most frequently? Which vulnerabilities appear to be the most concerning to peer organizations? Which are lower priorities?

We all learned in high school that copying one classmate’s answer on a test question is not only unethical, but a risky proposition given there’s no assurance that you picked the right classmate to copy. However, if you could determine that 90% of the class chose a specific answer, you’d have significantly more confidence the answer was the right one. Applying AI to actual vulnerability remediation data across multiple organizations can yield insights based on the collective judgement of many hundreds or thousands of IT and security peers, and as discussed previously, the larger that peer group grows, the higher the probability the decisions are sound.

Using a machine-learning technique known as Gradient Boosted Tree Regression, user behaviors and preferences can be blended with their history of remediation to predict what is important (for example, clickthrough rate). Using this ever-expanding database of cloud-based users and their remediation activity, the contribution to the vulnerability risk score becomes a dynamic element that reflects the constantly changing nature of the threat.

Developing Remediation Plan Recommendations

Once a context-driven priority list of vulnerabilities is established using the AI methodologies detailed here, optimizing remediation work is the final step in the vulnerability management process. Here, AI has a role to play as well.

Most medium to large enterprises can identify more vulnerabilities on their networks than could practically be remediated in any reasonable timeframe, so developing remediation plans that maximize risk reduction while minimizing remediation activity is essential to any modern vulnerability management program. AI can be leveraged to address this challenge as well. Specifically, a Risk-Aware Recommender System – a hybrid between collaborative filtering and a content-based system – can be used to generate multiple remediation scenarios.

Similar to the algorithm a retailer uses to make recommendations to consumers, a vulnerability management Recommender System would also take into account risk reduction afforded by each remediation scenario using individual vulnerability risk scores generated using AI techniques.

Conclusion

Advances in artificial intelligence afford IT and security teams a number of opportunities to reduce the human effort required to reduce the vulnerability risk of their networks. As the complexity of networks increases along with the number and sophistication of threat actors, AI technology can help alleviate the exploding burden on typical enterprise vulnerability management operations teams by enabling a combination of intelligent decision-making and automation, all of which is made possible by today’s artificial intelligence technology.

Call to Action

Learn how advances in artificial intelligence provide cybersecurity teams with opportunities to reduce human effort in determining vulnerability risk: The Role of AI in Modernizing Vulnerability Management

Secureworks has been acquired by Sophos. To view all new blogs, including those on threat intelligence from the Counter Threat Unit, visit: https://news.sophos.com/en-us/.

ABOUT THE AUTHOR

PIERRE-DAVID ORIOL

SERGE-OLIVIER PAQUETTE

Back to all Blogs