Alex F

solve et coagula

UX/IA/Research/Strategy/InfoSec

Musician

 Related Resources

The Background

Rapid7 has a number of products that analyze cloud resources in isolation.  Examples include a classic vulnerability management product (InsightVM), a cloud security product (InsightCloudSec), and an Identity and Access Management (IAM) feature.  InsightCloudSec examines misconfigurations across all of a customers’ assets in the cloud, and its Identity and Access Management (IAM) feature examines users, roles, and permissions, flagging and flags misconfigurations that could result in privilege escalation and more.

The Problem

Taking something like a privilege escalation in isolation also doesn’t tell the entire story of your attack surface.  For example, an attacker might escalate their privileges to where they have administrator-level access to everything on the system.  Okay, so they compromised this resource.  Is the compromised resource especially valuable?  Does the attack mean anything for the surrounding devices?  If neither, then we might deprioritize fixing the resource.  What if, however, the privilege escalation not only confers admin privileges but also prompts a more valuable adjacent or nearby machine to allow the attacker to log in with that admin account, giving it full privileges on this more valuable system?  Suddenly, a compromise to some unwatched machine can mean instant admin access to something more important, and this login might very well look like regular, normal, approved activity.

In a case like this, when investigating an isolated resource, you aren’t getting the full picture.  You aren’t seeing the full impact of the resource, should it get compromised.  In order to do this, you need to be able to see any at-risk resources and neighboring machines/systems, and understand their vulnerabilities as well as what their potential value is as a target.  The immediate “impact zone” is known as the blast radius.

The Solution

Seeing a list of resources will suffice if the user is just looking for the immediate impact. However, if the user wants to expand that out a level - to see the blast radius of the wider blast radius, for example - lists and tables aren’t very helpful.  Visualizations, particularly graph-style visualizations, suit this perfectly.

  • First, we took the single layer or abstraction of the “Related Resources” feature, and we represented it in a graph.  That’s pretty simple and predictable.

  • Then we wanted to give the user the ability to explore this and expand those graphs out over multiple abstractions, or layers.  We wanted to give them the ability to expand their view in order to see the related resources of the related resources.

  • We discovered that, in some instances, these graphs can get pretty large and unwieldy, so we introduced grouping of like resources.  If a resource has, for example, 12 NICs, we can just group those together.

  • For each resource, we had to give the user more than just the resource name and type, so we created “hovers” that show the cloud account and ID as well the region, etc. Clicking on a resource node brings up a side panel that slides in from the right to tell you all the most important details about that particular resource.

At this point, the feature is incredibly useful..  What we have is a “blast radius” and then some (because we have resources beyond what you might find in a typical blast radius) that the user can expand outward to their hearts’ content - and in any direction.  It’s all visually displayed, so the relationships can get very complex, yet the visualization remains very easy to understand, even at a quick glance.

The Related Resources screen, expanded out through several layers of related resources.

However, we’re a security company. Despite this being constructive, if we center on a compromised host, just noticing the related resources doesn’t tell us much.  Are those other resources vulnerable to the same attacks?  Are they mis-configured such that the attacker of the compromised host can simply log in to the neighboring machines and immediately gain admin access without even trying?  We can’t tell from the graph, so we added a “ security mode.” When turned on, the security mode lights up the resources that have critical and high severity security issues, indicating their problems.  We chose to only do this only for critical and high severities because if we included mediums, lows, and/or informational issues, the whole graph would light up like a Christmas tree.  It would reverberate with noise, and when everything is colored with a severity, then the issues that we genuinely care bout about no longer stand out as prominently.  Seeing all those colors and having to carefully examine each one to understand what it means would lead to cognitive overload.  Even if it only takes milliseconds for each individual security issue, they add up.

The Related Resources screen with security information turned on.

Now a user can look at a single resource and expand out its first layer of related resources and turn on the security mode.  The user can then focus on expanding out only the layers where the source node of that expansion has security issues, and ignore the noise.