Threat Stack vs. Red Hat Auditd Showdown

By Jen Andre

One of things we like at Threat Stack is magic.  But since magic isn’t real, we have to come up with the next best thing, so we’ve hired one of the libevent maintainers Mark Ellzey Thomas (we like to call him our ‘mad kernel scientist’) to make our agent the best in its class. 

Many of the more savvy operations and security people that use our service are blown away by the types of information we can collect, correlate, and analyze from Linux servers. They say something to the effect of, “I’ve tried to do this with (Red Hat) auditd, with little to no success… how do you guys do it?”  

The Linux audit subsystem is a very powerful way to collect information about system calls and other security-relevant activity.  The best part: no kernel module is required to enable this detailed level of auditing since it’s built right into Linux. You have the option to write a log any time a particular system call happens, whether that be unlink or getpid.  Since the auditing operates at such a low level, the granularity of information is incredibly useful.

Traditionally, people have used the userland daemon ‘auditd’ built by some good Red Hat folks to collect and consume this data. However, there are a couple of problems with traditional open source auditd and auditd libraries that we’ve had to deal with ourselves, especially when trying to run it on performance-sensitive systems and make sense of the sometimes obtuse data that traditional auditd spits out. To that effect, we’ve written a custom audit listener from the ground up for the Threat Stack agent (tsauditd).

We’ve asked our agent engineer, Mark, to make a video that highlights the performance, parsing fixes, and other changes we’ve put in the Threat Stack agent by demonstrating a ‘showdown’ with Red Hat’s audit. You can view it in all of its glory here (Warning: Mark tends to ramble):

Here are a couple of highlights from the video, along with some other facts that make our audit listener special:

1. Performance Enhancements

Many people have tried to use traditional Red Hat auditd in production to do very detailed auditing of user, process, and network syscall activity, but have failed due to the performance impact. We’ve ensured that our agent is responsible with resource utilization through our unique parsing model. In fact, while benchmarking a web server, we saw auditd consume 120% of the CPU. Threat Stack's agent CPU consumption was only 10%!

2.  Output Enhancements

This Linux audit system is special, and by special, I mean it will output many different lines across disparate events into syslog, which you then have to correlate later via your ingestion engine or your log management system.  The key-value format is also cumbersome to parse, and values are often encoded into hex randomly. We’ve decided that all related events should be grouped together and have conveniently parsed everything correctly for you. We then transformed that to a JSON output format that is much simpler to read and parse.

3. Network Tooling ("src/dst port")

Tracking network connections across multiple hosts can be a manual and painful process when trying to connect across boxes. To make it easier, our agent adds metadata to network connection events to determine where the connection is originating from and where it is going. Our backend is then able to correlate these network events to determine the originating process and potential user activity that caused that network event, so long as the agent lives on both the source and destination server.

This is especially useful for tracking SSH sessions across your environment and debugging what servers are speaking to one another and why.

4.  User Activity Auditing

Digging around the server logs to see where a user on your system went is not an easy job. You’d need to manually find the agent and session that a user connected to, yet all the kernel gives us is a nasty hex encoded string representing the connection address in the traditional auditd logs. On top of that, most of the information logged by auditd is not really relevant, and hard for the human eye to parse. To correct that, we’ve designed Threat Stack to keep storage of events, activity, and commands associated with a logged in user, and automatically reconstruct this information into a clean, compact, and readable timeline.

Stay tuned to read about some of the other engineering feats we are accomplishing at Threat Stack!

8 Patterns For Continuous Code Security

Guest post by Chris Wysopal, CTO at Veracode 

This is the fifth installment in our series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.


Best practices of secure agile teams

According to the 2014 Verizon Data Breach Investigations Report (DBIR), web applications are the #1 attack vector leading to data breaches.

So deploying insecure web applications into production can be risky -- resulting in potential loss of customer data, corporate intellectual property and/or brand value.

Yet many organizations still deploy public-facing applications without assessing them for common and easily-exploitable vulnerabilities such as SQL Injection and Cross-Site Scripting (XSS).

Why?  Because traditional approaches to application security are typically complex, manual and time-consuming – deterring agile teams from incorporating code analysis into their sprints.

But it doesn’t have to be that way.  By incorporating key SecDevOps concepts into the Software Development Lifecycle (SDLC) – including the use of cloud-based services with API-driven automation, centralized policies and tighter collaboration and visibility between security and DevOps teams – we can now embed continuous code-level security and assessment into our agile development processes.

In our own development environment, Veracode has adopted secure agile processes with the goal of rapidly delivering code without sacrificing security. Along the way we’ve uncovered eight patterns that lead to successful secure agile development.

These eight patterns work together to transform cumbersome waterfall methodologies into efficient and secure agile development.

1. Think like a developer.

In agile environments, developers write code based on work committed in the current sprint. At various points in the process, our developers upload code to Veracode’s cloud-based application security service, directly from their IDEs. The code is then analyzed and the results downloaded to their development environments, allowing them to address vulnerabilities before check in. By finding vulnerabilities during coding instead of during a separate security hardening sprint, developers need not switch context to work on code written long ago. This saves both time and velocity.

2. Find it early. Fix it early.

Too often, security testing is only implemented at the end of a sprint, as a pre-deployment gateway or checkpoint. This approach requires a separate security hardening sprint after the team has delivered a functionally-complete release candidate. This results in a hybrid waterfall/agile process instead of enabling the best practice of agile — completing all the work during the sprint.

In comparison, frequent assessments allow the team to identify and remediate release blockers early in the cycle — when they’re easier and less expensive to fix. This also reduces the overall risk of successful delivering the team’s payload. It only works because most Veracode assessments finish within hours or overnight (in fact, 80% of assessments for Java and .NET applications are completed in less than 4 hours). That means that continuous security assessments can fit into a one to two week sprint that’s typical in many development organizations.

3. Use multiple analysis techniques for optimum coverage and accuracy.

The appropriate combination of multiple analysis techniques — ideally in a single centralized platform with consistent policies, metrics and reporting — gives organizations the broadest view of application security.

Binary static analysis – also known as “white box testing” or “inside out testing” – analyzes data and control paths without actually executing the application, looking for vulnerabilities such as SQLi and XSS.

In comparison, dynamic analysis (DAST) – also known as “black box” or “outside in” testing – identifies exploitable vulnerabilities at runtime, during pre-production QA/staging.

Manual penetration testing looks for vulnerabilities that can only be found by humans, such as Cross-Site Request Forgery (CSRF) or business logic issues.

Of course, most modern applications aren’t built from scratch but rather assembled from a mixture of custom code and open source components and frameworks. Software composition analysis (SCA) is a handy way of identifying which components are used where – and which are known to be vulnerable.

4. Automate to blend in.

Blending in with developers’ automated toolchains means leveraging tools they already use. Automation inside the IDE (Eclipse) is used to build, upload, scan and then download results, which are shown against the code inside the editor for easy remediation. Automation at the team or release candidate stage allows the build server (Jenkins) to automatically upload build artifacts for assessment, using Veracode APIs. API-driven automation in the bug tracking system (JIRA) downloads results and manages the vulnerability lifecycle. Tickets for vulnerabilities are then triaged through the same process used for all bugs. When security assessments are blended in, developers don’t switch context – and they work more efficiently.

5. Play in the sandbox.

The sandbox is a way for individual developers or teams to assess new code against the organization’s security policy, without affecting policy compliance for the current version. One way to think about an assessment sandbox is to consider it as a branch inside the application. Developers can scan the branch and understand whether it would pass the current policy. Each team can also have a sandbox for merging multiple branches to assess the integration.

6. Avoid replicating vulnerabilities.

If you think about how developers work, there’s always a bit of copy-and-paste going on. Developers look at code and say, “All right, I’m going to use that pattern.” When vulnerabilities get replicated across the code base, it magnifies risk across projects. Then it becomes a big development effort – “security debt” – to clean up those vulnerabilities.

7. Learn from constant feedback.

Direct interaction between developers and detailed vulnerability feedback enables self-reflection. People begin to see their own coding habits and gain insight into how to develop more secure ones. The “aha” moment comes when a developer says “Oh, I shouldn’t have coded it this way because as soon as I upload it I’m going to see the same results.” Continuous feedback means they’re more likely to reuse secure patterns and avoid insecure ones.

8. Be transparent about security risk via policies.

We use labels to identify any vulnerability that violates our standard corporate security policy (such as “No OWASP Top 10 Vulnerabilities”) as a release blocker. This raises our visibility into vulnerabilities and allows us to triage every application-layer threat before release. Triage involves answering questions such as: Do we need to remediate this vulnerability? Can we mitigate it instead, and if so, how? Is this a risk we’re willing to accept? Visibility enables a pragmatic discussion about risk within the normal agile sprint management process.

Adopting these eight patterns has helped Veracode become more efficient, secure and successful in delivering code with short delivery cycles – without sacrificing security.

Stay tuned next Wednesday for our sixth installment in this series as we continue to dive deep into SecDevOps implementation. 


About Chris Wysopal 

Chris Wysopal, co-founder and CTO of Veracode, is recognized as an expert and a well-known speaker in the information security field. He has given keynotes at computer security events and has testified on Capitol Hill on the subjects of government computer security and how vulnerabilities are discovered in software. At Veracode, Mr. Wysopal is responsible for the security analysis capabilities of Veracode technology. He can be found on twitter as @WeldPond.


How DevOps and Models Enhance Behavioral Detection

By Aaron Botsis

This is the fourth installment in our new series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.


In an earlier article, Behavioral Threat Monitoring Without Models, I explained how you could use our Cloud Sight product to deploy a pre-trained behavior model on newly deployed systems. For the fourth installment of this SecDevOps series, I’m going to talk about how to further integrate security into DevOps processes and how these models work together in the bigger picture.

What are these models?

Ok, ok. I keep saying “models”, but what are they? How do they work? And most importantly, why do they matter?

We can use models to detect changes in system behavior with algorithms and math; Cloud Sight actually builds several different types of these models. The great thing is that the models don't even need to be that complicated. Why?

If you have any data scientist friends:

1. They’ll tell you that more data beats a better algorithm.

2. Wait, data scientists have friends?

So what can we do with more data? Let’s start with “processes with network activity”. For any group of servers, Cloud Sight builds a list of processes that are talking on the network. Once it’s finished learning, it starts to monitor for new processes. This is a simple but extremely effective technique to identify behavior variations. In fact, it’s so effective that in a 28M event sample set of accept(2) and connect(2) system calls, we saw just 321 unique executable names across our customers! We can apply similar techniques for other data such as process owner, parent process name, etc.

Why this is Good

Back in the dark ages, it was difficult to ensure a group of systems that were functionally similar actually behaved in a consistent and similar way. But then there was light. Thanks to DevOps and configuration management, system behavior is now a fairly consistent (and measurable) thing. Web servers that are all configured the same actually do the exact same thing, the exact same way. This is an epic win for security, my hipster brethren!

“I took this system to its maximum potential. I created the perfect system!”  

“I took this system to its maximum potential. I created the perfect system!”


“Epic Win?"

Totally. Here’s why: Imagine these models can be created, destroyed and tested programmatically, alongside your existing development processes.

We can start by training these models during our continuous integration tests. We know the environment is pristine, and we’re already testing all of the things. Why not train our models which behavior is “good” while we’re at it? It’s like a self-generating, infrastructure-wide whitelist.

Now we can apply those behaviors to systems we deploy for production. Anything that deviates from what we tested is likely an intrusion. But even if it’s not, it could inform us of imperfections in the system. Maybe we forgot to test something. Maybe there’s a corner case that only affects production for some reason. Maybe something’s running away because of an unidentified failure elsewhere in the system, consuming precious elastic resources.

Finally, once we’ve iterated and ironed everything out, we can add automated chaos-monkey style remediation to the mix. When a system deviates from it’s expected behavior, quarantine and replace it automatically.

Bringing it all Together

It used to be that “deploy” meant “run make install”. The number of interaction points between applications was minimal and easy to grok. Today's infrastructures are more complex than ever, and DevOps is showing huge value in quick iteration. Thanks to configuration management, applications and the infrastructure supporting them are more consistent than ever. So it only makes sense to leverage behavioral monitoring to iterate quickly without forgetting lessons learned from the past while protecting the infrastructure at the same time.

Stay tuned for next week’s SecDevOps blog post featuring Chris Wysopal, Veracode’s CTO, on code analysis as part of CI.

Who Gets Access to Production?

By Sam Bisbee, CTO

This is the third installment in our new series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.

Remote access to production machines is a long contested battlefield that has only gotten uglier since the rise of Software as a Service, which has obliterated the line between building the system and running the system. This caused new methodologies to be enacted, the most popularly touted being DevOps, which is really just an awful way of communicating that everyone is responsible for running the system now. One critical implementation detail that smaller SaaS companies have always understood due to hiring constraints is that the entire technical staff is required to be on call. Yes, even the engineers, developers, or whatever else you call them.

The New Policy

“Lock out the developers” is not an acceptable policy anymore. Developers inherently build better systems when they experience running them. Who would allow a bug to linger if it continuously woke them up throughout the night? This pain was not felt widely enough in the previous “throw it over the wall to operations” world. I can sense desperation rising from the PMs over their kanban story velocity, “If an engineer is on call, then they won’t be able to write code!” While this statement is factually accurate, the sentiment is not.

First, operations has an equally important and lengthy work queue. Second, those paging alerts are likely the most important bugs regardless of whether they’re an uncaught exception (engineering issue) or RAID alarm (operational issue). This typically confounds those new to the SaaS world because they have not fully grasped the ramifications of the Service with a capital “S”. The Service is always on and is the product through which you deliver value. This is one of the best examples of how SaaS companies are so much different culturally and operationally than companies that “ship” product. You are not running an IT department.

Don’t Over Correct

This remote access policy may seem like an over correction, which is why proper controls are critical. One of the most cited fears for granting more people access is the lack of change control. When you apply this fear to developers, what people really mean is that they are afraid of hot patches. This is completely and utterly reasonable.

Hot patches decrease visibility into the system, slowing down or outright preventing the ability to debug. The worst-case scenario is a hot patch actually damaging the system or corrupting user data, which is exponentially more likely due to the lack of testing. The technical community should fully understand by now that “it worked on my laptop” or “it shouldn’t do that” are not reasonable statements when releasing. The only true prevention for hot patching, especially when implementing a populist remote access policy, is to create a frictionless release mechanism. Make it trivial for your teams to build, test, and initiate a staggered release into any of your environments. Ideally your build server is testing every push to your master git branch and anyone can promote a successful build from that server.

Trust but Verify

If frictionless releases are our trust, then accordingly we must verify. Enter monitoring. Techniques such as the Pink Sombrero are good (digital sombreros are better), but you must introduce continuous security monitoring into your environment. For ages there have been tools and techniques that do this, but most teams do not employ them because of their complexity, outdated implementation (taking hashes of your entire multi-TB filesystem in an IO bound cloud or virtual environment is asinine), and volume of false positives. It does not have to be so complicated though. For example, alerting when a user other than chef changes files in your production server’s application directory is an easy first step that a team of any size can easily grasp.

For those who are concerned about access to customer data, whether it be PII or something less toxic, this remote access policy does not apply to that data, as it should live in a segregated environment. They are also likely concerned with passing audits, and the prospect of listing their entire technical team as having production access is not intriguing. In such scenarios, non-operators should be locked out of production unless they are on rotation. Adding and revoking their SSH public key from the gateway on-demand can make controlled access easier.

You Get What You Need

All of this is to say that collectively we are still trying to figure out the security balance in the technical community. Too often people want security, but see it as prohibiting productivity so they punt. This is unfortunate for the obvious reasons, but also because properly operationalized security begins to enhance the developer’s and operator’s experience. Tools are leveraged that make the system easier to run and control. Different monitoring solutions are installed that make the system easier to debug and verify. And, everyone gets access to production.

Stay tuned next Wednesday for our fourth installment in this series as we continue to dive deeper. Until then, be sure to check out our first and second posts in the series.

Threat Stack Names Executives As Company Brings Innovative Cloud Security Service to Market

We're excited to announce today that we have added several key members to our management team.  Sam Bisbee has joined as CTO; Chris Gervais as VP, Engineering; and Pete Cheslock as Senior Director, Operations and Support.

“Threat Stack is at a really exciting point in time, as we come off a highly successful beta program and prepare to launch Cloud Sight into the market. Our management team has deep experience across enterprise, cloud, SaaS and security, and a track record for successfully bringing innovation to market.  Were thrilled to have attracted an all-star team.”

- Doug Cahill, CEO of Threat Stack

About the Executive Team

Sam Bisbee is a senior technologist that brings experience and expertise in delivering highly scalable distributed systems via SaaS.  Most recently Sam was CXO of Cloudant, a leader in Database as a Service (DBaaS) technology; before that he held key technology positions at Bocoup and Woopid.

Chris Gervais has led technology teams developing large, scalable, enterprise-grade solutions and bringing SaaS offerings to market.  Before Threat Stack, Chris was CTO and SVP, Engineering at LifeImage, a platform for securely sharing medical images; and VP, Engineering at Enservio, a SaaS application and analytics platform for insurance carriers.

Pete Cheslock has a record of supporting SaaS customers with highly reliable and scalable solutions.  Pete was previously Director of DevTools at Dyn, a provider of network traffic management and assurance solutions; and before that he was Director of Technical and Cloud Operations for Sonian, a cloud-based archiving platform.

“This team is a testament to Threat Stacks unique technology and the big problem that it addresses. Elastic and dynamic infrastructures, and the services that run on them, are really difficult to monitor and protect.  Threat Stack has cracked the code.” 

- Chris Lynch, Chairman of the Board and a partner at Atlas Venture

Our flagship product, Cloud Sight™, is the first and only intrusion detection SaaS offering purpose-built to provides elastic cloud infrastructure with comprehensive protection, detection and response against malicious threats.  Cloud Sight has been in a highly active beta program which resulted in multiple customer case studies, including Populi, a cloud-based college administration platform, and University of Hawaii at Manoa CollegeBeta participation ranged from SaaS vendors to MSPs and enterprises running in most major cloud service providers including Amazon Web Services and Rackspace, as well as in private and hybrid-cloud deployments. Cloud Sight will be commercially available this fall.

Interested in trying out Threat Stack? Request an invite to our beta: 



The Case for Continuous Security

By Pete Cheslock

This is the second post in our new series of weekly blog posts that dives into the role of SecDevOps. This series looks into why we need it in our lives, how we may go about implementing this methodology, and real life stories of how SecDevOps can save the Cloud.

DevOps is a term that has absolutely blown up in the last 5 years.  As someone who’s been involved with that community from the earlier days, it’s been interesting to watch the conversations around DevOps evolve over time.  For many people, they had an immediate adverse reaction towards Yet Another Buzzword -- especially when the core concepts that people described as being “DevOps” were things that many people had already been doing for years.  (I’m not going to bother getting into the specifics of “what is DevOps” since there is already a plethora of blog posts that you can easily find on it.)  

One of the core tenets of what people consider to be “DevOps” is to shorten the feedback loop in your development cycles.  By reducing the amount of time for those feedback loops, your teams can iterate more quickly on changes and ship those features to your customers sooner. This tenet ties in directly with Agile methodologies utilized by software engineering teams. With the advent of easily accessible cloud infrastructure, and with the various operational tooling around those new infrastructure providers reaching a new level of maturity, we are now seeing a world where “DevOps” is mainstream.  For companies starting new product development initiatives, using some form of Configuration Management is now table stakes to iterate quickly. Additionally, we see more and more companies shed their physical data center presence in order to leverage the flexibility and accessibility of public compute resources provided by companies like Amazon, Microsoft and Google.  

The inherent nature of these IaaS providers is to make it as easy as possible to provision systems to meet your infrastructure needs -- and to do so very quickly.  Speed to market is a major competitive advantage that many companies are leveraging through the concept of Infrastructure as Code.  Provisioning hundreds or thousands of compute instances in mere minutes is now considered an everyday activity.  Everyone wants to move fast.  

Continuous Integration. Continuous Deployment.  But who (or what) is continually monitoring the state of your operational security?

We now have a world where your junior system administrator is able to make a small change to a Chef Recipe, Puppet Manifest, or maybe an Ansible Playbook, and deploy it to production within minutes.  But what is the scope of that change?  System Administrators don’t want to be slowed down by the security team.  They don’t want their configuration management changes to be passed through a Change Control Board.  They want to change a variable, open a pull request, and once merged, they want their operational tooling to do the rest.  They want their change to hit production servers as soon as possible.  

Screen Shot 2014-07-16 at 2.19.35 PM.png

This is where SecDevOps, or SecOps, comes into play. (Let’s ignore the fact that it’s just as silly of a buzzword as “DevOps”). If DevOps seeks to value empathy between teams that traditionally had different incentives for their positions (Devs valuing constant change, Ops valuing stability), SecDevOps seeks to evoke the same outcome with your Security teams and the rest of the business.  

When you are in a world where you are continually deploying change, you need to move towards a world where you are continually monitoring the security implications for those operational changes.  Often times, there is no single person at your company that is able to say with absolute certainty which changes to your infrastructure have additional risks towards your security posture.  And if you have a traditional network security organization that is manually reviewing and approving changes to production, you’ve now introduced the newest bottleneck in your organization.  

It’s this conversation that excites me the most about joining Threat Stack.  As a technical operations veteran of the last 15 years, this is the most important (and exciting) problem to solve in many organizations.  Having the opportunity to help build a product that will enable companies to continue to break down operational silos while improving the speed in which they are able to track and respond to security incidents is an absolute dream job for me.

I see SecDevOps as the qualifier for this discussion.  How do you improve your security monitoring and response times, while maintaining your ability to continually deploy changes? These are hard problems to solve, and we are all excited to be in this unique position where can actively help companies solve this problem.  

Stay tuned next Wednesday for our third installment in this series as we dive deeper into the technical integrations that make SecDevOps happen. And in case you missed it, you can check out our inaugural post here.

About Pete

Pete Cheslock is the Senior Director of Operations and Support at Threat Stack.  Previously, he was the head of automation and release engineering at Dyn, managing and deploying to mission critical global DNS infrastructure.  Prior to Dyn, Pete was the Director of Technical Operations for Amazon-Backed cloud archiving company Sonian. You can follow Pete at @petecheslock on Twitter.