Alex F

solve et coagula

UX/IA/Research/Strategy/InfoSec

Musician

 

Rapid7 - Layered Context

The Background

Rapid7 has several products that analyze cloud resources in isolation.  Examples include a classic vulnerability management product (InsightVM), a cloud security product (InsightCloudSec), and an Identity and Access Management (IAM) feature.  When you examine resources on their own, it can be useful, but without the full picture of all the resources together, it’s almost impossible to determine which resources are the top priority for fixing.

The Problem

How do you know whether 1 resource has more issues than another, and how do you know when that matters?  

Imagine a scenario where ResourceA has 100 vulnerabilities and ResourceB has 20. 

Fixing based on the sheer numbers isn’t the right way to prioritize.  What if ResourceB, with only 20 vulnerabilities, has several severe issues that could easily lead to privilege escalation, whereas the ResourceA has only medium and low severity issues to deal with?  Several severe issues in ResourceB can lead to privilege escalation, making it more vulnerable to compromise. Thus, it should be fixed before ResourceA, which has only medium and low severity issues.  What if one is publicly accessible, and the other isn’t?  There are several factors that need to be examined and prioritized holistically.

Given this project as a UXer was a challenge because initially there were no requirements.  There was no direction, and no actual idea of how this would work or what it would look like.  We only knew that 

We had to take all the security information available, and put it all together in a way that made sense, was quick and easy for the user, and gave the users the ability to slice and dice this data based upon their needs so that they would only see the data that they needed to see.  The users needed a place to see the most severe issues in a controlled manner that eliminates noise..

The idea of “layered context” was utilized to create a new, centralized, single pane of glass approach to solving these issues.

The Solution

We knew we could pull in the vulnerability data from our InsightVM Vulnerability Management product, and organize it within InsightCloudSec (ICS) just like we were already doing with cloud-based misconfigurations.  We already had a good amount of cloud IAM data as well. But we had to show it all in one place.  Obviously, you can’t show everything in the same place at the same time.  That would give us endless table scrolling and information overload, as well as potentially produce a lot of noise for the users.

We accomplished this, in part, through the paradigm of “progressive disclosure.”  Progressive disclosure sequences how and when data is displayed over several screens or interactions.   Jakob Nielsen defines progressive disclosure as a technique that “defers advanced or rarely used features to a secondary screen, making applications easier to learn and less error-prone.” Progressive disclosure follows the typical notion of moving from “abstract to specific,” including the sequencing of user behaviors or interactions.

The goal is to prevent both information overload and noise (not always the same things), and give the users the information they need, and only the information that they need).  We needed a model of progressive disclosure that could manage massive numbers of resources (potentially millions), with large numbers of security issues per resource (potentially hundreds, if not more).  We couldn’t do this one by one, or piecemeal.  There had to be a logical workflow.

We had to divide resources and reduce them, so they could focus on a smaller set of data instead of all 5 million.

  • When looking at potentially millions of problematic resources, how do you know which are the worst and need addressing first?

  • How did we come to any prioritization conclusion?  We need to display the evidence that guided us to this conclusion.

  • We need a potential way for users to disagree with that, and force the system to bend to THEIR rules of importance.

  • Users need to be able to react to the issues, whether that means providing data to fix the issues themselves, automating the fixes, or assigning fixes to others.

There’s a lot of information available for each cloud resource, and showing as much at once as possible just isn’t conducive to efficiency.  Therefore, the initial list of resources that would be displayed for Layered Context would contain just enough data to let a user know ‘this resource has a combination of severe issues that need remediation, and these resources have less.’  

Once a resource was selected, the user could dive a little deeper to see a list of these issues prioritized by severity, as well as further details about each resource, all without leaving the main screen.  Users could view the relevant information based on their current workflow, and also had the option to explore more detailed pages for individual vulnerabilities.

For each resource, we threw it all on the table and figure out what was not at all necessary.  Vulnerabilities - Critical, High, Medium, and low, and the same for Misconfigurations and IAM Misconfigurations.  We tried heat-mapping the data (in an effort to make prioritization criteria viewable at a glance - it got mixed reviews in user testing), and different methods of sorting and ordering the data.  In the end, after multiple iterations and a few rounds of user testing, we made the following decisions:

  1. Include only enough resource information (ID, name, arn, cloud account, Cloud Service Provider icon) to identify the resource.

  2. Display whether the resource was publicly exposed or not.

  3. Dump all the Medium and lower severity items.  One thing I learned a long time ago from working with the military as we developed the world’s first IDS product: “Just show me the criticals and highs.  I don’t care about seeing anything else on the screen.  If there are no criticals and highs, THEN we will decide if we want to see Medium or lower severity items, but that will probably never or rarely happen.”

  4. The user can view additional severity items in the table by selecting them from the "display columns" drop-down.

  5. We presented the count of vulnerabilities etc. in a badge.  The badges were color-coded to the severity level, included the first letter for each severity (Critical = C, High = H, etc.) that also included a count.

  6. If a resource has 5 critical vulnerabilities, for example, and they wanted to know more, they could click on the badge, which would slide in a multi-tabbed panel from the right. This panel would pre-select the vulnerabilities tab.. The tab shows, in a condensed table format, sorted from the most to least severe, all the vulnerabilities found for that resource.  This would give basic information such as the vulnerability’s name, severity, CVSS score, etc.  If users needed further details, they could click on the vulnerability name and be taken to a page dedicated to that vulnerability. The page includes details on the vulnerability itself, fix information, and a list of all resources that also have this same vulnerability.  The same applied to IAM and misconfigurations.

This solution allows the user to see an ordered list of their most severe resources, and has just enough information to allow the user to determine which resources required a little further investigation.  The slide in panel allowed a medium level of details on both the resource itself, and its security issues, without leaving the context of the Layered Context page.  The users could add or remove other data from the table at will, and order the data in the table by any column (e.g. ordering the data by public accessibility, etc.).

Furthermore, an advanced filtering system was put in place in order to allow users to further slice and dice the data, narrowing down the list of resources even further.  For example, the user could display in the table:

“Show me only EC2 resources with critical IAM misconfigurations AND critical and high misconfigurations, belonging to the AWS_Production account, and the US-West region”

This wasn’t too bad for a 1.0 release, and it is becoming the new centerpiece for the entire ICS platform.