When Security meets FinOps, so SecFinOps

Marc-Henry GEAY
LINKBYNET
Published in
11 min readNov 5, 2021

--

Cloud is elastic, but not your budget

Photo by Ian Schneider on Unsplash

Co-written with my teammates Charles Mure and Antoine Paris.

TL ;DR

The implementation of all cloud security best practices mentioned in standard frameworks ends up taking 16% to 49% of the total application budget.

We believe this cost is irrational and can be significantly improved.

We found ways to save 18% to 54% of Cloud Security cost without impacting the risk coverage.

How we did it?

We decided to move away from standard compliance frameworks. Instead, we rethought security measures from the ground up with realistic risk scenarios for Cloud Environments #BackToFoundamentals.

We synthetized our approach in a new method that we called COSMIRC: Cloud Optimized Security Measures with Identical Risk Coverage.

A short story on FinOps

It all started with a discussion with our colleagues on the FinOps team about their work to help optimize security services; it turns out that cloud security services typically account for 10% to 15% of the cloud bill, with spikes up to 25% being hard to justify…

According to our colleagues, the reasons for these unexpected costs can be explained by an information systems security policy that is not adapted to the specificities of the cloud, a lack of technology intelligence and visibility on the real cost of security services, and security services that are activated but under-utilized or forgotten.

Following these observations, we wanted to confirm them with a controlled experiment, and, if they were true, understand the mechanics behind these costs while trying to find ways to optimize them. For that, we tried to answer the following questions:

  • Can we demonstrate with a controlled experiment the orders of magnitude of the cost of security on a cloud application observed by the FinOps team?
  • Can we optimize the cost of security for a cloud application while maintaining the same level of risk coverage?

Our Lab with a Demo app

Following our lines of thought, we designed and built a demonstration application based on usual architecture patterns observed at our customers. This application offers a B2B service for industrial customers who required custom-made equipment.

It’s composed of three main parts : the e-commerce website that handle customer orders with Computer-aided design (CAD) files ; then an analysis backend that controls compliance and technical aspects of CAD files, get metrics from orders ; finally a synchronizing step to push order details and CAD file to the On-Premise enterprise resource planning (ERP).

Our Demo App architecture

This application manage some customer data: personal information (PII) and internal corporate equipment details. To protect these sensitive assets, we applied Cloud security best practices and defense in depth to our hybrid architecture by following five cloud-common pillars :

  1. Infrastructure / perimeter security : defense in depth of network, hardened compute resources like VM and functions (managed-script runtimes) and vulnerability scanning.
  2. Data protection : all data are encrypted at rest and in transit with Cloud-managed service, ensure that all application network flows are private (transit inside the Cloud network).
  3. Identity and access management : use Cloud-IAM service, block lifetime credentials/tokens, apply the least privilege principle for all systems involved with Cloud services and On-Prem ERP.
  4. Detection : identify main cloud misconfigurations and detect suspicious behaviors on cloud management plane.
  5. Resilience : assure a high availability of services (website) and integrity of customer data (CAD files) with cross-region replication.

Build of our financial analysis model

To confirm the FinOps findings of our colleagues, we choose to rely on one of the most important Cloud providers, AWS. We started to census all cloud resources used and arranged them into production and security services. We mapped our application components with more than 65 AWS price items to get a first picture of the monthly cost for one customer order.

We improved our simulation to get dynamically the precise costs based on any order amount. The following graph shows the evolution of production and security related monthly costs depending to the number of orders and the “so-wanted” production vs security ratio.

Demo app cost with production vs security ratio

Costs of some security services explode on high traffic applications

Our demo app cost insights

Our simulation show that the production/security ration increases non linearly with the number of customer orders. It reaches up to 49% with 100,000 monthly orders. The results confirm those of the FinOps team.

Moving to a SecFinOps approach

From these surprising findings, we draw a hypothesis that the problem was caused by the three following factors:

  1. Cloud-based security services are multiplying and telescoping, creating opportunities for optimization.
  2. There is a decoupling between the cost of security measures and the severity of the risk covered, which creates opportunities for reallocation based on risk prioritization.
  3. Traditional Risk Analysis methods rarely include discussions around the cost of security measures.

From this hypothesis, we proposed a new methodology for risk analysis of Cloud applications that include discussion around the cost of security measures. The goal of this methodology is to help Security Practitioners to find opportunities for cost optimizations of security measures but without changing the risk coverage of the application. We called it COSMIRC: Cloud Optimized Security Measures with Identical Risk Coverage.

So how did we build this SecFinOps methodology?

We started from a standard risk analysis method, and then we added a few user stories to it. We identify the following:

  • As a Security Practitioner, I want to be able to see the impact of a security measure on the Cloud bill for the application.
  • As a Security Practitioner, I want to be able to visualize the cost associated with the mitigation of a specific risk scenario.
  • As a Security Practitioner, I want to be able to select a security measure to cover a risk scenario in a pre-defined set of cloud security services.

Then, we ask ourselves which data sources we will be mandatory to build our method. We identify the following: list of realistic cloud risk scenarios, catalog of technical security measures available on each CSP and knowledge of Cloud Services cost.

Pragmatic SecFinOps methodology

We want to give you some practical tips to help you to consolidate your data sources if you want to do it yourself. Here’s our recommendations.

Firstly, to build our Cloud Risk database with realistic scenarios we relied on:

  • Well-known technical source of attack tactics on Cloud environments, the MITRE ATT&CK matrix for Cloud providers.
  • Historical data of public Cloud security incidents.
  • Cloud Security Expert Panelist to refine frequency of risk scenarios. Creating an internal Cloud Security Expert panel is a well-known technique in quantify risk management to increase the accuracy of our estimations when data is very sparse (which is the case with Security Incidents that are most of the time private). If you want to know more about this approach, I can highly recommend articles on Medium from Ryan McGeehan like this one.

Then, to build our list of possible technical security measures for each standard risk scenario, we relied on best practices from Cloud Security Providers and associated documentation on Security configuration. Also, we collected some insights from our Cloud Security Expert panel.

Finally, to collect information regarding cost of Cloud security services, we recommend the usage of “Cost calculator” on Cloud Service Providers (CSP), the analysis of your actual cloud bill report on CSP or a commercial FinOps tools.

Also, we knew that if we wanted a new method to be used by practitioners it must be built with key adoption principles in mind. We think that there are a few criteria’s (others than solving a real problem) that a method must have to facilitate adoption.

  1. Ease of communication: your method must produce findings that are easy to understand for non-security people.
  2. KISS, Keep It Simple Stupid: try to avoid too early over-optimization or complex procedure/operation so that scaling it at an organizational level is easier.
  3. Low Gap: ensure that the new method present only small change with the actual operational flow (for risk analysis).

From method to tooling

We conducted a first implementation our COSMIRC methodology. For now, we kept it simple by re-using an existing Risk Analysis tool on a Spreadsheet that we augmented to include our new features. Currently, we implemented it only for AWS environments, but it can be easily extended to others CSP.

And we can tell you that the results are very promising!

In our experiment, we were able to save 18% to 54% of security cost with COSMIRC. We were able to make two cost optimization measures without changing the coverage on our main risks: the risk of confidential data compromise and the risk of client’s data lost.

Security and production costs after optimizations

Also, another conclusion we draw form this experiment is that Pareto’s Law seems also applicable to Cloud Security cost optimization. In short, 80% of the cost optimization will be produce by 20% of the efforts (e.g. Update of security measures).

But exactly, how did we manage to reduce the Cloud Security cost up to 58% App without changing our risk exposure? Let’s see how we did it with a deep dive on one cost optimization.

Measure optimization on data lost risk

If you want to mitigate the risk of data lost on your Cloud application, many frameworks recommend replicating your data in another Cloud region with Cross-Region disaster recovery configuration. The problem is that this recommendation for resilience and availability is very expensive. But is it really useful to cover the risk of data lost?

Maybe to change your perspective on this risk for Cloud environment, we would like you to ask yourself the following question stated by the Cloud Economist Corey Quinn in his great article on S3 Data Durability.

What are the odds of someone in your organization (accidentally or otherwise) deleting critical data from the wrong bucket or misconfiguring a lifecycle policy to do it for them?

And indeed, the reality in cloud risk management is closer to this statement:

A bad push to the control plane that doesn’t get caught and destroy all your data in your account is more likely than the complete disaster of a Cloud region, the collapse of a government or the takeover of cloud datacenter by terrorist group.

Therefore, we identified two plausible scenarios for the data loss risk in Cloud environments which are:

  1. Unavailability of a complete Cloud Region (Low probability)
  2. Critical data has been deleted by legitimate or illegitimate user leading to business disruption (High Probability)

Side note: we know that the second scenario has already been seen in the real world with the CodeSpaces attack in 2014. The company had to shut down his business because a hacker destroyed all their customers data in their Cloud environments. You can read more about this attack on this article.

And here, we have an optimization opportunity because these 2 scenarios don’t need the same security measures to be mitigated, and the measures for the second scenario are actually cheaper than the ones for the first scenario. The optimization opportunity is presented the following way on our COSMIRC implementation:

Risk analysis with real costs

During our risk analysis discussion, we choose not to cover the first scenario (Unavailability of a complete Cloud Region) because it is very unlikely to happen, but we choose to mitigate the second scenario with the “Same-region replication in isolated Account” measure. We did not choose the cheapest option to cover the second scenario because we wanted a more versatile solution that is not dependent of service’s specific features like S3 Lock. As mentioned by Corey Quinn: “Backing your data up elsewhere is a great idea if your business is going to have a hard time existing without it. “Another region” is a good idea; “another AWS account that nobody from the first account has access to” is a better one;”.

Taking an helicopter view, optimizing data loss prevention looks like this in our model (and yes, we are French and proud 😉):

Optimization effect: risk optimization with lower cost

This example of optimization on the data loss risk in Cloud Environment is only one of the many possibilities that were highlighted by our experiments and research. Here are some other examples we have found for cost optimization:

Some cost-killer optimizations

Lesson Learned

In the end, our approach boils down to something very simple:

We moved away from standard compliance frameworks and rethink security measures from ground up with realistic risk scenario in Cloud Environment #BackToFoundamentals.

However, one downside of our approach is that moving towards a personalization of security measures to optimize the recurring cost can be seen has incompatible with modern ways of developing Cloud environment. Indeed, with Infrastructure-As-Code approach, we want to standardize our stack (including our security measures) so that we can easily replicate infrastructure for applications and reduce development time, which translate directly to money savings.

Our opinion is that both approaches are still highly valuable. One way to benefits from both is to keep standardization with Infrastructure as Code with default security for your small-scale applications (and small bill) but apply a fine-grained SecFinOps approach to your large applications that represent most of your spending’s.

What’s Next?

At this point, our approach optimizes security measures based on the cost variable because we know that this variable is still very important for any company in the world. But we think that the big next challenge will not be only about money. It will be about climate change.

We, the security experts, have a responsibility to take to ensure that the security tools we deploy are optimized not only to reduce their cost, but especially their carbon footprint.

Our SecFinOps methodology can provide opportunities and insights around the carbon footprint of security measures right from the risk analysis phase. As Accenture shows in its analysis, the use of the cloud meets the challenges of Green IT. In the future, it would be great to find ways to integrate our approach into “Green Security” strategies with tools such as Cloud Carbon Footprint to help security teams measure, monitor and optimize their carbon emissions from the cloud without impacting their security posture.

Our team & contributors

Feel free to contact us on LinkedIn!

--

--