This blog uses CSP level 2 script hash support

This blog has a few inline script blocks. I could externalize them, but wordpress injects some stuff. So I just used script-hash support to whitelist the blocks. Partly out of laziness. Partly to give some attention to script-hash.

Script hash is awesome because it works on static sites very well. It’s also awesome because you’re essentially declaring a checksum for the code on the page. It’s even more awesome because it’s backwards compatible with browsers that don’t support script hash.

Want to compute the hashes for your page? Type this into the console (after sourcing the necessary files from https://code.google.com/p/crypto-js/)

<script src="js/cryptojs/rollups/sha256.js"></script>
<script src="js/cryptojs/components/enc-base64-min.js"></script>
<script src="js/jquery.min.js"></script>
<script>
  console.log("Add the following values to your script-src to whitelist them using hash sources:")
  $.each($('script'), function(index, x) {
    if (x.innerHTML !== "") {
      console.log("'sha256-" + CryptoJS.SHA256(x.innerHTML).toString(CryptoJS.enc.Base64) + "'");
    }
  });
</script>

Add those values to your script-src. Dun.

Oh hey, I have a Rails PoC that might become a thing soon. Feedback welcome: https://github.com/twitter/secureheaders/pull/67.

The video demo is on youtube:

Twitter’s CSP report collector design

We recently scrapped our previous CSP reporting endpoint and built a custom, single-purposed app. This is highly proprietary and will never be open sourced (do you run scribe, viz, logstash, etc???), but here are the building blocks of the design. This just launched a month or so ago, so I’m sure there is room to improve.

Normalization

The incoming data is very wacky. Various browsers with various levels of maturity with an infinite number versions in the wild create chaos. Here’s a few things we do to normalize the data.
Apologies if some of this is not 100% accurate, I have forgotten the details of these quirks since they have been abstracted away.
  1. Firefox used to add ports to violated-directives and a few other fields. These are rarely useful and muddy the data as it won’t match any other user agent. Strip these unless you run on non-standard ports.
  2. Inline content is indicated by a blank blocked-uri, yet many browsers send “self”. Change this to “” to be consistent.
  3. Strip www from document-uri host values. Unless you serve different content of course.

Send extra fields

The violation report has some data, but not everything I want. You can “smuggle” special values by adding them to the report-uri query string. I suggest adding the following fields:
  1. Application. Where did this come from? Use an opaque ID, or don’t. Revealing this information should not really matter.
  2. Was the policy enforced?
  3. What was the status code of the page? (might not be too valuable to most, but our reverse proxy replaces content for error pages)

Extract extra fields at index time

A blocked-uri is nice, but a blocked-host is better. This allows you to group reports much more effectively in logstash. That way, entries for https://www.example.org/* can be grouped as example.org violations. Here are all of the fields we extract:
  1. blocked-host: the blocked-uri with the scheme, www, port, and path removed (tbh the blocked-uri is otherwise useless and a potential violation of privacy).
  2. report-host: the document-uri with the scheme, www, port, and path removed
  3. classification1 : is this mixed content? inline script? unauthorized_host? this is not an exact science, but it’s useful.
  4. path: the path of the document-uri
  5. app_name: from the “extra fields above” – helpful if multiple apps are hosted on one domain.
  6. report_only – useful coordination and for boss-type people
  7. violation-type: the first token in the violated directive – helpful if your policy varies within an app.
  8. browser, browser +  major version: take the user-agent, but normalize the values into easily defined buckets. This is very useful for classifying plugin noise.
  9. operating system (may indicate malware)
1 Pseudo code for the classification:

Filter Noise

This is probably the most important thing to do, and it builds off of the ideas mentioned above. This is even less of a science than classifying reports. These reports are not counted in most statistics and are not sent to logstash. We still log them, but to a different location and we _will_ use this data to help browser vendors. Are we overzealous with our filtering? Probably.

We’re filtering ~80% of our reports!

 
 

We consider anything that we deem as “unactionable” or “too old” as noise. This noise come from plugins mostly, but also comes from other strange sources such as proxy sites that replay our CSP. Strange. The quality of the reports improve over time, so we started filtering out reports that were old than some arbitrary cutoff point in versioning.
I’ll just drop this bit of scala code for ya. It’s ugly. monads or something. This list grows by the week.


And here’s a graph of our filtered report data:

Screen Shot 2014-08-24 at 11.20.18 AM
Reason for being filtered
Screen Shot 2014-08-24 at 11.18.46 AM
Legend for the graph of filtered reports

Now we can get to business

Now that we’ve normalized and filtered our data, we can get to work! We use logstash to dive into reports. The main feature we use is field extractions where we take all of the “extra fields” to logstash so we can quickly dive into reports.
LogLens
Logstash reports

OK, so how do I look at the mixed content violations for twitter.com, specifically the old rails code?

I search for:

classification:”mixed_content” app_name: monorail
blocked_host:twimg.com OR blocked_host:twitter.com
violated_directive:script-src app_name:translate.twitter.com

Now I can look at the various fields on the left and keep on digging! Logstash is awesome.

Now show your work

For each application in our stack, we provided two simple graphs that allow people to take a quick glance at the state of things.
viz
Reports by classification and violated directive, per application

Automatic XSS Protection with CSP: No changes required

In order to help ease approval for the Content Security Policy script-hash proposal, I created a PoC to demonstrate that this is just as easy as script-nonce. I believe script-hash is an idea that solves some of the shortcomings of script nonce. However, it is significantly more complex. I think that the complexity can be greatly reduced with proper tooling. My PoC branch aims to prove that this can be practical. I have a sample application with all of this in action.

Hash? Nonce? Huh.

CSP is great for restricting inline script. It has received some backlash because to truly leverage the XSS protection provided by CSP, you need to remove all inline javascript (among other tasks). A solution for whitelisting inline content would certainly increase adoption. Here are the differences between the two proposals:

Script nonce


Content-Security-Policy: script-src 'nonce-abc123'

<script nonce='abc123'>console.log("Hello world");</script>

IFF the nonce in the script tag matches the value in the header, the script executes.

Downside: Protection can be entirely circumvented if you have dynamic javascript. Caching of dynamic values cause caching issues, not great for massive scale. Using a static value reduces/eliminates protection. Using an easily guessable value is also troublesome.

Upside: pretty easy to apply

Script hash


Content-Security-Policy: script-src 'sha1-<BASE64 ENCODED SHA1 HASH OF THE CONTENTS OF THE SCRIPT TAG>'

<script>console.log("Hello world");</script>

So in this case, script-src ‘sha1-fU8Y3i83rje0823mI+3hgmqgysc=’

Downside: moar harder for developers and browsers to implement.

Upside: if you don’t use dynamic javascript, your code is effectively certified as code that is allowed to execute. Doesn’t cause caching issues. Strength of protection is determined by hash strength and not implementation.

Script Hash Generation

  • Grab all templates (stuff that turns into html that kinda already looks like html)
  • Iterate over each file and:
    • Grep the code for /(<script([s]*(?!src)([w-])+=([“‘])[^”‘]+4)*[s]*>)(.*?)(</script>)/mx
    • Take each match (second to last capture group in this case, ruby 1.8 doesn’t support named capture groups).
    • Hash the value with SHA256 and base64 encode the output.
  • Store the filename and any hashes (e.g. in a YAML file, hash, associative array, whatever). Key: filename, value: array of hashes.

Script hash application

  • Hook into the framework so that anytime a template is rendered, we take note.
  • Once rendering is done, add the hashes (if any) of all rendered templates to the content security policy.

“Automatic inline script CSP protection”

To hopefully satisfy this claim, here’s some steps you’d have to take:

  • Have a task that watches the filesystem for changes to your templates.
  • Update the script hashes that are applied to the given template without having to restart any process.

Here’s a (poor quality) screen cast of my PoC branch:

Surprises

  • Generating hashes “on deploy” is no good. Tests would break if CSP is enforced and the hashes are outdated.
  • I’m not that great with Regexen. In writing this post, I noticed at least one improvement I can make.

diff x-webkit-csp x-content-security-policy

There have been a few projects looking to port the secure_headers logic. However, this code got out of control fast and it is difficult to glean. There is a github issue that will make this code much cleaner by taking the logic out of conditionals and into methods for the two different headers, which should be represented by two different classes. Anyhow.

Inline script/css and eval

Allowing inline script is bad. Don’t do it. If you have to, do this:

x-webkit-csp


script-src 'unsafe-inline' 'unsafe-eval';
style-src 'unsafe-inline'

x-content-security-policy


options inline-script eval-script

Note:

  • They are in two different directives
  • They use different values to enable the functionality
  • The values are quoted in the webkit csp header
  • X-content-security-policy doesn’t specify what the allowed inline value is. It actually does not block inline styles even if this value isn’t present.

Differing Directives

‘allow’ doesn’t exist in the webkit header. Also, the ‘allow’ directive in the mozilla implementation is somewhat analogous to the ‘default-src’ directive in the webkit implementation. allow functions like default-src, except you can only allow inline script/css + eval in the options directive mentioned above. Secure_headers abstracts this out so you only need to provide default-src, the translation to options+allow is transparent. Also, there are some values that differ or just do not exist in the two implementations. I think this code somes it up.

x-webkit-csp


DIRECTIVES = [:default_src, :script_src, :frame_src, :style_src, :img_src, :media_src, :font_src, :object_src, :connect_src]

x-content-security-policy


FIREFOX_DIRECTIVES = DIRECTIVES + [:xhr_src, :frame_ancestors] - [:connect_src]

In this case xhr-src (Firefox) == connect-src (webkit)

chrome-extension:

Obviously Firefox doesn’t need to support the chrome-extension uri scheme!

Reporting

This is a biggie, and probably the biggest gain of using secure_headers. Firefox will not send reports to hosts that don’t match the original host name. This can be difficult if you are sending from a subdomain, different tld, etc. To get around this, the library implements an internal endpoint that is used to forward the requests anywhere. There is some debate as to whether this will become part of the spec.

Bells and whistles

There are a few things working their way into the spec, this lib mimics their desired functionality:

  • Copy default-src/allow into all directives that don’t have a value. Seeing a CSP report with a default-src violation is pretty useless as you don’t know the actual cause.
  • Copy chrome-extension: into all directives. Too much noise.
  • Whitelist data: uri’s for img-src

Removing Inline Javascript (for CSP)

Here are the techniques I use to remove inline javascript from applications. This is the most important step in applying content security policy. CSP 1.1 has the concept of a script-nonce which allows inline script that matches the value, but I feel this is a bandaid. Before this becomes a part of the spec, I’m pushing for every application to remove inline javascript.

The thing I like most about these techniques is that it only requires one form of escaping (HTML entity escaping), which is widely supported by templating languages, as opposed to context-specific escaping.

Loading values

Single values: create a hidden input

<input id="mything" type="hidden" value="<%= html_escape(@donkey) %>" />
var thingy = $('#myinput').val()

Multiple related values: create a hidden span with data-* attributes

Using a naming convention and/or programmatic loading go a long way here. 
<span class="hide" data-attr1="<%= html_escape(@attr1)%>" data-attr2="<%= html_escape(@attr2)%>" id="mycontainer"></span>
var firstThingy = $('#mycontainer').data('attr1');
var secondThingy = $('#mycontainer').data('attr2');

Loading complex objects: place the object in the content of script tag as HTML-escaped JSON, read the innerHTML of the span, and parse it as JSON.

This technique is also outlined in the OWASP XSS Prevention Cheat SheetThanks to @rx: improved on this a bit by using a script tag rather than a span. The values must still be escaped! Use html entity encoding here too. If you must, json encoding will work as well. Otherwise, breaking out of the script tag is just a matter of placing
which is parsed by the browser BEFORE the javascript is parsed/executed. Again, even invalid JSON is trumped by what the browser thinks you meant to do and it will happily render an attackers closing script tag and what is to follow.

A completely valid alternative is to JSON encode the values. Why didn’t I use that approach? Because few templating languages support this directly. Take Mustache, which I absolutely love: your choices are HTML entity encode or output the raw values. This would require the JSON encoding to happen outside of the template, which is a recipe for disaster and trains people to think triple staches (raw data) is ok. This is bad.


<script type="application/json" id="init_data">
<%= html_escape(@donkey.to_json)%>
</script>

Note! You MUST set type=”application/json” or CSP will consider it code, and block the inline script.
In an external JS file, read the value as raw html and parse the encapsulated JSON to yield an associative array (a.k.a. map, dictionary, hash, etc).


var dataElement = document.getElementById('init_data');
var jsonText = unescape(dataElement);

// you may need to do additional processing, like calling split
   var initData = JSON.parse(jsonBlock);

Google analytics

We often make use of per-page values for google analytics. This often includes dynamic values. The technique above will work, but since it is common, here’s a few helpers.

ApplicationHelper

def google_analytics_setting index, value
  content_tag 'span', '', :class => 'hide',
    :id => "google_analytics_#{index}", :'data-value' => value
end

“myview.html.erb”

<%= google_analytics_setting 1, 'Key', 'Value' %>

external.js

function setCustomGAVar(index, key, value) {
  _gaq.push(['_setCustomVar', index, key, value]);
}

$(document).ready(function() {
  // set per-page values unobtrusively
  $('span[id^=google_analytics_]').each(function(index) {
    var self = $(this);
    var id = self.attr('id');
    var index = id.substring(id.length - 1);
    var key = self.data('key');
    var value = self.data('value');
    setCustomGAVar(index, key, value);
  });
};

Static Analysis + Log Analysis = Secure code and metrics

Background

I really enjoy working with statistics and analytics tool so I thought I’d apply that to another passion: application security. This shows how you can use open source and free tools to build a dead simple Ruby on Rails application security program based on static analysis with metrics to measure performance.

In this example I will use Loggly as my logging aggregator. In this example, you can easily get by on their free service.

Setup


gem install brakeman
gem install syslog-shipper (TLS support unreleased at the moment)

Use

brakeman -o ~/logs/brakeman_appname.tabs <path to app> 

(optional, run using jruby)

jruby --1.9 bin/brakeman -o ~/logs/brakeman_appname.tabs <path to app>

Collect

  1. Login to loggly, you should be on the dashboard.
  2. Click the “+Add Input” button.
  3. For “Service Type”, select “Secure Syslog” and fill out the rest of the fields as you see fit.
  4. Click the “+Add Input” button setup the input.

Click on your newly created input and you will be taken to a drill down view. Under the “Destination” column, you will see a host and port number. Take this value and paste it into the following command:

syslog-shipper --ca-cert ~/certs/sf_intermediate.crt 
-v -s logs.loggly.com:<port> ~/logs/brakeman_appname.tabs

* sf_intermediate.crt is the CA that signed the SSL cert that the input is using. You can download the loggly certificates on the drill down page.

Visualize

Run this Loggly command:

compare 'Dynamic Render Path','Cross Site Scripting','Mass Assignment','unprotected redirect'


More info:
https://github.com/jordansissel/syslog-shipper
https://github.com/presidentbeef/brakeman
https://app.loggly.com/pricing/

who am i kidding, this is a blog on content security policy