Threat Stack Blog and Cloud Security News

Continuous security monitoring for your cloud.

Cloud Security Best Practices: Finding, Securing, & Managing Secrets, Part 1 — truffleHog & git-secrets

by Tom McLaughlin Feb 21, 2017 2:23:48 PM

Starting Your Cloud Security Journey Part 3.png

Secrets — passwords, API keys, secure tokens, private keys, and so on — protect access to sensitive resources in your environment. If not properly managed, they can end up in the wrong hands.

In Part 1 of this post, we will show you how to find secrets using truffleHog and git-secrets. In Part 2, we will explain how to manage them using appropriate software tools in order to quickly and cost-effectively achieve a higher level of security.

How Secrets Leak

Passwords, API keys, and secret tokens must not be left lying around your environment unprotected. Their purpose is to provide controlled access to sensitive resources such as a database that holds customer information, or your billing system, or the provider that you send usage data to for calculating customers’ bills each month. They could even provide controlled access to other systems in your own environment.

The list goes on, but the lesson is the same: You need to keep your secrets where they are easy to find but not easy to access.

Based on that, one of the worst places to store secrets is in application code unencrypted:

resp = requests.get(
    'https://app.threatstack.com/api/v1/alerts',
    headers={
        'Authorization': 'rWJTjTMuAcU3VyWohCAvmIKEPqwANv47LTQfv9Bys9WLMdL6KaLmj8qsisZffFWtb'
    }
)

Ironically, however, one of the most common places to find secrets is, you guessed it, in application code unencrypted.

We have been told for years that this is a terrible practice, but people continue to do it. The reasons can be anything from new developer ignorance, developer indifference, or failure to provide developers with a way to manage their secrets. We know we’re not supposed to do this, yet it still happens. Storing secrets in code is both an operational and a security nightmare.

Threat Model (Assumptions)

The underlying assumption in this post is that secrets protect sensitive data from leaking or environment traversal to less secure parts of the infrastructure. So we are specifically discussing the leakage threat in this post and are not concerned with things like man-in-the-middle attacks or brute force.

The following lists some of the ways that secrets might leak and who they might be leaked to:

  • Code repository exposure:
    • Secrets exposed to internal employees. (Depending on the secret, this might or might not be an issue.)
    • Secrets exposed to external people. (For example, a misconfigured public GitHub repo might expose your secrets.)
  • Laptop loss/equipment theft:
    • A developer's laptop might be lost and accessed by a third party.
  • Application reverse engineering/code reading:
    • Not all code is private. Front-end web application code is sent to the user to be rendered.
    • Some software stacks will reveal code and/or a stack trace back to the user on failure, potentially revealing secrets.
    • Some deploy methods use git pull, but don’t restrict access to the .git directory.

A Workplace Scenario

To look at this problem in a workplace context, consider the following scenario.

You have just started working at an exciting new SaaS company.

The configuration management code base has some secrets listed in it, but none of them relate to the product services. (For example, you have found a database username and password and that user is being setup for database access through configuration management.)

However, you can’t figure out how a developer’s application might know how to talk to that database using those credentials. You check for S3 buckets that might contain credentials, but find none. Finally, you ask a developer how their application connects to the database, and they point you to a line of code on GitHub. There you see a username and password stored in plain text. As you look at their code and the code from other services, you realize the passwords and API tokens are regularly stored directly in code.

You correctly conclude that no secrets management solution (such as HashiCorp Vault or Square’s Keywhiz) is in place and that leakage of credentials is a large potential threat in this environment.

So how do you eliminate this problem?

Finding Secrets Using truffleHog and git-secrets

A great way to find secrets that are already embedded in your code is to use software utilities such as the following:

A best practice is to use them in combination (truffleHog first followed by git-secrets) because each utility has its own strengths, and together they complement each other.

truffleHog’s output is more friendly than git-secrets’ and works by default, scanning the Git history. On the down side, its matching is not as configurable as git-secrets, and it lacks the ability to ignore files or commits, and does not have the ability to check commit ranges. (If you are adventurous, you might want to fork it and take a look at the PRs people have been filing to add functionality.)

git-secrets uses pattern matching to find secrets. Its default setup is good for finding AWS API access and secret keys. truffleHog, on the other hand, will miss access keys but not secret keys, although it will not find a Threat Stack API key, which truffleHog would find, without additional configuration. The additional investment in configuration might help you find secrets that truffleHog would not.

With that overview out of the way, we will now focus on using truffleHog and git-secrets in turn.

truffleHog

truffleHog is a Python script with a dependency on the GitPython module that is designed to find potential secrets with a git repository by using entropy analysis. It won’t just find secrets that exist in the current code base, but will also check the repository's history for secrets so you can see secrets that may have been previously exposed. Because it works only on the git repository history and not just the current code, it is better used as the first tool for finding secrets.

Keep reading to learn how to install, setup, operate, and enhance truffleHog to make it even more effective.

Installation

Install truffleHog as a user module, or into a virtualenv if you prefer:

[tmclaughlin@tomcat-ts:aws-straycat other]$ git clone git@github.com:dxa4481/truffleHog.git
Cloning into 'truffleHog'...
remote: Counting objects: 115, done.
remote: Total 115 (delta 0), reused 0 (delta 0), pack-reused 115
Receiving objects: 100% (115/115), 21.29 KiB | 0 bytes/s, done.
Resolving deltas: 100% (58/58), done.
[tmclaughlin@tomcat-ts:aws-straycat other]$ cd truffleHog/
[tmclaughlin@tomcat-ts:aws-straycat truffleHog(master)]$ pip install --user -r requirements.txt
Collecting GitPython==2.1.1 (from -r requirements.txt (line 1))
  Downloading GitPython-2.1.1-py2.py3-none-any.whl (441kB)
    100% |████████████████████████████████| 450kB 1.8MB/s
Collecting gitdb2>=2.0.0 (from GitPython==2.1.1->-r requirements.txt (line 1))
  Downloading gitdb2-2.0.0-py2.py3-none-any.whl (63kB)
    100% |████████████████████████████████| 71kB 2.8MB/s
Collecting smmap2>=2.0.0 (from gitdb2>=2.0.0->GitPython==2.1.1->-r requirements.txt (line 1))
  Downloading smmap2-2.0.1-py2.py3-none-any.whl
Installing collected packages: smmap2, gitdb2, GitPython
Successfully installed GitPython-2.1.1 gitdb2-2.0.0 smmap2-2.0.1

Operation

Now that you have installed truffleHog, run it against a git repo. The command takes a github path, which means you can scan either a local or remote repository as follows:

[tmclaughlin@tomcat-ts:aws-straycat truffleHog(master)]$ python truffleHog.py ../../threatstack/threatstack-to-s3/
Date: 2017-01-24 12:05:41
Branch: trufflehog
Commit: Tired of forgetting to set this...

(This is for testing TruffleHog.)

@@ -5,7 +5,7 @@ import os
 import requests

 THREATSTACK_BASE_URL = os.environ.get('THREATSTACK_BASE_URL', 'https://app.threatstack.com/api/v1')
-THREATSTACK_API_KEY = os.environ.get('THREATSTACK_API_KEY')
+THREATSTACK_API_KEY = 'rWJTjTMuAcU3VyWohCAvmIKEPqwANv47LTQfv9Bys9WLMdL6KaLmj8qsisZffFWtb'

 def is_available():
     '''

Analysis

truffleHog has output a commit, and its diff that contains a potential secret. The truffleHog output will be in reverse chronological order and will be similar to having run git log -p, but with only offending commits shown. The suspected key will be highlighted in the output.

The tool is not perfect, however:

  • It is good for finding random strings, but not non-random strings (e.g., if someone used a passphrase for a secret).
  • If a string isn’t long enough, it probably does not have enough entropy. For example, truffleHog misses AWS access keys, which are 20 character long uppercase letter and number strings.

Here are some observations on using truffleHog based on tests conducted at Threat Stack:

  • Using randomly generated alphanumeric strings of 32 characters, only 63% of strings were flagged as being entropic enough.
  • Using randomly generated alphanumeric strings of 36 characters, 90% of strings were flagged as being entropic enough.
  • Using randomly generated alphanumeric + punctuation strings of 36 characters, 0% of strings were flagged as being entropic enough.
  • Using randomly generated alphanumeric + punctuation strings of 64 characters, 0.2% of strings were flagged as being entropic enough.

Now you know how to install and run truffleHog, so let’s discuss how to install and run git-secrets.

git-secrets

git-secrets is a git command that is designed to help you find secrets in a repository and to help you from committing secrets. Instead of evaluating string entropy, the tool searches for patterns using grep regular expressions. While truffleHog regularly catches AWS secret keys but not access keys, git-secrets catches AWS access keys but not secret keys.

Keep reading to learn how to install, setup, operate, and enhance git-secrets to make it even more effective.

Installation

You can install git-secrets on a standard Unix / Linux OS on OS X, using Homebrew as a package manager.

Unix / Linux

[tmclaughlin@tomcat-ts:aws-straycat other]$ git clone git@github.com:awslabs/git-secrets.git
Cloning into 'git-secrets'...
remote: Counting objects: 218, done.
remote: Total 218 (delta 0), reused 0 (delta 0), pack-reused 218
Receiving objects: 100% (218/218), 67.38 KiB | 0 bytes/s, done.
Resolving deltas: 100% (124/124), done.
[tmclaughlin@tomcat-ts:aws-straycat other]$ cd git-secrets/
[tmclaughlin@tomcat-ts:aws-straycat git-secrets]$ sudo make install 

OS X Homebrew

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ brew install git-secrets
==> Downloading https://homebrew.bintray.com/bottles/git-secrets-1.2.1.sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring git-secrets-1.2.1.sierra.bottle.tar.gz
🍺  /usr/local/Cellar/git-secrets/1.2.1: 7 files, 60.2K


Setup

Once you have installed git-secrets, set it up for pattern matching in order to find secrets. You will set this up globally for the user as shown by the --global flag. The following will add configuration to ~/.gitconfig:

$ git secrets --register-aws --global
$ git secrets --list
secrets.providers git secrets --aws-provider
secrets.patterns [A-Z0-9]{20}
secrets.patterns ("|')?(AWS|aws|Aws)?_?(SECRET|secret|Secret)?_?(ACCESS|access|Access)?_?(KEY|key|Key)("|')?\s*(:|=>|=)\s*("|')?[A-Za-z0-9/\+=]{40}("|')?
secrets.patterns ("|')?(AWS|aws|Aws)?_?(ACCOUNT|account|Account)_?(ID|id|Id)?("|')?\s*(:|=>|=)\s*("|')?[0-9]{4}\-?[0-9]{4}\-?[0-9]{4}("|')?
secrets.allowed AKIAIOSFODNN7EXAMPLE
secrets.allowed wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

The following are matched by default:

  • secrets.providers git secrets --aws-provider: Your AWS access and secret keys in ~/.aws/credentials.
  • secrets.patterns [A-Z0-9]{20}: AWS access keys.
  • secrets.patterns
    ("|')?(AWS|aws|Aws)?_?(SECRET|secret|Secret)?_?(ACCESS|access|Access)?_?(KEY|key|Key)("|')?\s*(:|=>|=)\s*("|')?[A-Za-z0-9/\+=]{40}("|')?: AWS secret keys. (But only when declared as the value for AWS_SECRET_ACCESS_KEY or some permutation of that. Review the entire regex.)
  • secrets.patterns
    ("|')?(AWS|aws|Aws)?_?(ACCOUNT|account|Account)_?(ID|id|Id)?("|')?\s*(:|=>|=)\s*("|')?[0-9]{4}\-?[0-9]{4}\-?[0-9]{4}("|')?: AWS account IDs. (But only when declared as the value for AWS_ACCOUNT_ID or some permutation of that. Review the entire regex.)

git-secrets also provides an example access key and secret key that will be allowed and can be used for things like testing and documentation.

Operation

To use git-secrets, you can scan the current code base or the history. Let’s start with the git history, similar to what we did with truffleHog:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan-history
7f5170ab6e87cd143f349911de3f3c70a3ef8297:app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
7f5170ab6e87cd143f349911de3f3c70a3ef8297:app/models/s3.py:15:AWS_SECRET_ACCESS_KEY = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'

[ERROR] Matched one or more prohibited patterns

Possible mitigations:
- Mark false positives as allowed using: git config --add secrets.allowed ...
- Mark false positives as allowed by adding regular expressions to .gitallowed at repository's root directory
- List your configured patterns: git config --get-all secrets.patterns
- List your configured allowed patterns: git config --get-all secrets.allowed
- List your configured allowed patterns in .gitallowed at repository's root directory
- Use --no-verify if this is a one-time false positive

git-secrets shows a more condensed view of what truffleHog would show. Instead of a git log style display, it shows the git ref, file name, line number, and offending line.

You can scan the current code by passing a list of files as arguments or by omitting them and passing -r instead. The output format is mostly the same with the exception of no git ref:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan -r
app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
app/models/s3.py:15:AWS_SECRET_ACCESS_KEY = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'

[ERROR] Matched one or more prohibited patterns

Possible mitigations:
- Mark false positives as allowed using: git config --add secrets.allowed ...
- Mark false positives as allowed by adding regular expressions to .gitallowed at repository's root directory
- List your configured patterns: git config --get-all secrets.patterns
- List your configured allowed patterns: git config --get-all secrets.allowed
- List your configured allowed patterns in .gitallowed at repository's root directory
- Use --no-verify if this is a one-time false positive

This is great to start! But it still needs tuning. Since git-secrets is using pattern matching, it is possible that the patterns it is looking for are too restrictive. For example, if you change line 15 from declaring AWS_SECRET_ACCESS_KEY to AWS_SECRET, you would match the existing patterns for the tool:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan -r
app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'

[ERROR] Matched one or more prohibited patterns


Enhance git-secret Performance

To enable git-secret to catch more potential secrets and to do it without causing too many false positives, start by making a list of services you might use along with their respective API key formats:

  • PagerDuty
    • Authorization token: 20 character alphanumeric (upper & lower) + some symbols
    • Service Key: 32 character hex
  • GitHub
    • Personal access token: 40 char hex
  • Threat Stack
    • API key: 64 alphanumeric
    • Deploy key: 72 alphanumeric
  • Slack
    • Token: 74+ “xoxp-{11 numeric}-{12-13 numeric}-{32 hex}”

As you can see from the above, there is a variety of formats. Of note, it is not likely that either the PagerDuty token or key would be caught by truffleHog (32 character alphanumeric had a 63% chance of being caught. 32 character hex was never caught).

The same goes for the GitHub and Slack tokens: None of them were picked up in tests we ran at Threat Stack.

Therefore, you need to enhance git-secret in order to add additional matching rules.

Now think about the passwords you have generated internally for services such as databases. What are their formats? How long are they? Are they just alphanumeric, or do they include some symbols? Depending on the formats you have used, you may have to add yet more matching rules to git-secrets to be able to find shorter, less entropic strings but without too many false positives.

Start by adding a regexp for finding 20 character alphanumeric and punctuation strings:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --add --global \(\"\|\'\)[[:alnum:][:punct:]]\{20,\}\(\"\|\'\)

Now you get the following:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan -r
README.md:48:                "arn:aws:s3:::"
README.md:59:                "arn:aws:s3:::/*"
app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
app/models/s3.py:15:AWS_SECRET = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'
app/models/s3.py:59:            client_continuation_token = response.get('NextContinuationToken')
app/models/threatstack.py:7:THREATSTACK_BASE_URL = os.environ.get('THREATSTACK_BASE_URL', 'https://app.threatstack.com/api/v1')
app/models/threatstack.py:9:THREATSTACK_APP_KEY = 'TvtEnyhuE4yKVzEj80JOWgXvJPAiJqd6PaVo2aMKtynvYT0pJ89lusrSF3PXfaEO'

[ERROR] Matched one or more prohibited patterns


Analysis

You are now picking up much more with git-secrets. In addition to the AWS access key, you are catching the secret key which you stopped finding because you renamed the variable, and you have also found the Threat Stack API key.

This is great, but we still have to deal with the false positive issue. In order to reduce the number, you can add an additional -a flag to allow certain regular expressions. Let’s start by capturing snake-case strings:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --add -a --global \(\"\|\'\)[[:alpha:]_]\{20,\}\(\"\|\'\)

This clears up false positives that might be dictionary or hashtable values. Compare the output from the latest run (below) to the previous output:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan -r
README.md:48:                "arn:aws:s3:::"
README.md:59:                "arn:aws:s3:::/*"
app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
app/models/s3.py:15:AWS_SECRET = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'
app/models/threatstack.py:9:THREATSTACK_APP_KEY = 'rWJTdTMuAcU3hje2WSie7W0M2kQ8k15dfj2q'

[ERROR] Matched one or more prohibited patterns  

As a result of the added allowable matches, the overall results have been reduced to three secrets and some AWS ARN string examples from the README.md. Now you can add another allow pattern, but since it is specific to this repo, you are going to drop the --global argument:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --add -a \(\"\|\'\)arn:aws:.*:::.*\(\"\|\'\)
[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --scan -r

app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
app/models/s3.py:15:AWS_SECRET = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'
app/models/threatstack.py:9:THREATSTACK_APP_KEY = 'rWJTdTMuAcU3hje2WSie7W0M2kQ8k15dfj2q'

[ERROR] Matched one or more prohibited patterns

If you are happy with what’s being caught now, you can take an additional step and add commit hooks to prevent passwords from being added in the future:

[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git secrets --install
✓ Installed commit-msg hook to .git/hooks/commit-msg
✓ Installed pre-commit hook to .git/hooks/pre-commit
✓ Installed prepare-commit-msg hook to .git/hooks/prepare-commit-msg
[tmclaughlin@tomcat-ts:aws-straycat threatstack-to-s3]$ git commit -av
app/models/s3.py:14:AWS_ACCESS_KEY = 'BBF3A5D0XJTM6V2O0MHQ'
app/models/s3.py:15:AWS_SECRET = 'ct5HjyiPIxDvW2gho/vQ3A+NBIf8adXvp3FtmOFN'

[ERROR] Matched one or more prohibited patterns

Keep in mind that with git hooks installed, you will now be forced to fix all your errors and not just the changes you are about to make.

Conclusion

In this post, you have learned how to find secrets using truffleHog and git-secrets. In Part 2, you will learn how to manage them by removing them from the code and placing them where they will be easy to find but difficult to access.

And Finally . . .

If you are just starting out in cloud security, be sure to download a free copy of Jump Starting Cloud Security. This playbook is a hands-on guide that has everything you need to get on the fast track to securing your AWS cloud infrastructure.

Yes, I'd Like to Read It

Topics: Cloud Security Best Practices, Cloud Security Maturity, Managing Secrets

Tom McLaughlin

Written by Tom McLaughlin

Subscribe via email:

Posts by Topic

see all