Other Helpful Stuff

MySQL/SQL Poisoning and SQL Insertion

Learn how web applications with access to your database (e.g. MySQL) may be used to poison, dump, or delete information in your database.

MySQL Foreign Keys

Learn all about MySQL foreign keys in this white paper.

MySQL Foreign Key Errors and Errno: 150

Learn how to avoid MySQL foreign key errors, including the notorious Errno:150 in this white paper.

MySQL/SQL Data Validation (with PHP)

Learn the importance of data validation in web applications when information is accepted from third parties, or even from internal users.

Data Validation in (MySQL/SQL) Web Applications

Have questions that weren't answered here? Feel we've left something out? Have any feedback? Let us know: Contact Us

Introduction

Data validation is an extremely broad topic. It's usefulness ranges from improving user experience to ensuring that data is in a form that can be used to security. In this white paper, we're going to explore the different types of data validation, and what they can be used for.

Where Does Data Validation Happen?

Server Side Validation

Server side validation is where all your important validation takes place. Typically, users can't do anything to manipulate your server side code (or in fact even see it), which allows you to really tighten up the security here. For this white paper, we'll be using PHP as an example since it's so widely used. It's not hard to extrapolate to other languages. In a nutshell, you use server side validation to do the following:
Use server side validation to secure your site against attacks
Use server side validation to make sure data is ready for insertion into databases
Use server side validation to make sure data is "normalized" for future use

What About Client Side Validation?

In web applications, when people say client side validation, they typically mean that the web browser somehow validates data that is being entered into it. If your application is being written in HTML/CSS/JavaScript (alternatives are Flash or Silverlight), then client side validation is typically done using Javascript. There are some great things about this kind of validation, but you need to take care how much you depend on it.

What is Client Side Validation Good For?

Essentially, client side validation is great for things like improving the user experience. For instance, if the user is creating an account, and they need to pick a username and a password, you probably have some requirements for both the username and password. You probably don't want special characters in the username, and you might require some in the password. You can let them send their request to the server, realize on the server that the requirements aren't met, and then when the page refreshes, tell them so. But that's a waste of time (anything sent over the network takes time), and server resources. You could just use javascript to check the fields as they enter them, and then tell them as they enter them that there is a problem. This saves them time, and improves their experience.

What is Client Side Validation NOT Good For?

The answer is pretty much anything else. The problem with client side validation is you can't count on it. It doesn't take too smart of a person to turn it off, and only a slightly smarter person to figure out how to send requests to your server without even using your forms/built in posts. While this may be more difficult to do using things like Flash or Silverlight, it can still be done.
Never depend on client validation for security!
In fact, never depend on client validation for anything that is important!
Since client side validation is limited in its utility, we'll concentrate for the rest of this paper on server side validation.

How Do You Do Server Side Validation?

Where Does Data Come From?

Data in web applications come from user input. Either they fill out a form, or they click on a link that's supposed to take them somewhere. In order to get them to the right place/store their information, your server has to handle their data. This data typically comes through one of two places, either parameters passed in the URL (called GET variables), or parameters passed in the body of the request (called POST variables). POST variables are encrypted over HTTPS where GET variables are not. Also, your POST can be arbitrarily long, while GET's have a limit to their length because URLs can be limited to as low as 2000 characters (the standards say to not go above 255 characters).

In any case, you are going to get some parameters, in the form of GET or POST variables. How you get access to those depends a little on your programming language. In PHP, they are automatically loaded into two arrays: $_GET and $_POST.

Ok, I've Got My Data, What Do I Do With It? Guidelines for Validation

  1. When do you validate something?

    The short answer is ALWAYS. The long answer is whenever data is transferred from a less trusted source to a more trusted location. Whenever you take data from a web client (browser) to your web server, that's exactly what's happening, so you have to do it all the time for data coming from the web browser.

  2. Allowed Value Validation

    This is the strongest type of validation, where you know that a posted variable should come from a list stored on a server. This often comes from a drop-down list that you populated from the server in the first place. The wrong thing to do is trust that it hasn't changed. The right thing to do is to check it against the list when it comes back to the server. If it isn't in the list, either something went wrong in the transfer, or you're under attack. The point is you have revalidate everything that comes from the client, even if you sent it there in the first place.

  3. Check Data Attributes

    You should always check the attributes of data being received by your server. Important attributes are:
    1. Data Type: Is it a string, integer, decimal number? If it's a number is it signed or unsigned? Etc.
    2. Data Length: If it's a string, check against a minimum/maximum character length. If it's a number check that it's in an acceptable range.
    3. Patterns: If you know that it supposed to be an email address, make sure that it looks like an email address. You can most typically do this using regular expressions. If you're unfamiliar with them, here's the Wikipedia Page on Regular Expressions.

  4. Accept Known Good vs. Reject Known Bad

    Accept Known Good Wins. For instance, let's say that you want to validate for email addresses. You start to make a list of all the characters that are not allowed in an email address. Then you "validate" email addresses against that. The problem? That's a lot of stuff, and you're probably going to forget something! Now let's look at the other way of doing it. You know that email addresses look something like this [numbers and letters]@[numbers and letters and periods].[letters]. You can validate against that. If you happen to have left something out that is possible, the worst case is that someone's email will not get accepted. The worst case scenario if you used the "Reject Known Bad" method is that someone could perform an attack on your site using the characters you forgot to block.

  5. Sanitize Your Data

    Perhaps you want to let your users enter a phone number in a variety of different ways: (555)555-5555 or 555-555-5555, etc. You can let them do that, and then replace the parentheses and the dashes with nothing. However, after you've done this, you should validate that the number looks like 5555555555 (it is only numbers and it is 10 digits long).

    Again you can do this by allowing known good data or rejecting known bad data. For instance, in PHP, I could use the preg_replace function in two ways:
    1. Accept Known Good Data:
      $input = preg_replace("/[^0-9]/","",$input);
      This replaces anything that is not a number with blank.
    2. Reject Known Bad Data:
      $input = preg_replace("/\(\)\-/","",$input);
      This replaces parentheses and dashes with blank.

    It's pretty clear that the first replace is much stronger than the second, and is preferable from a security standpoint.

  6. How Not To Sanitize Your Data

    In early versions of PHP, there was something called "magic quotes", which would automatically escape all of your POST and GET variables' quotation marks. This is supposed to help against SQL poisoning/insertion where adding a quotation mark into a parameter in a query could trick the query into thinking that that value was done, and then it could do other bad things (see White Paper on SQL Poisoning). What it really did was give people a false sense of security that their data had been "cleaned". You shouldn't ever try to do some global cleaning of data without knowing what the data is and what it's supposed to look like. It turns into a security nightmare, and a coding nightmare for that matter.

Conclusions

Data validation takes a fair amount of thought, and often a fair amount of coding infrastructure to do correctly. It's important when creating forms or user input environments to evaluate first what kind of data is going to be submitted, and second, where it is going. If it is going into an SQL query, you need to make sure that it is escaped correctly (or that it is going into a safe environment like a prepared query statement). This will prevent application errors that arise from bad data being used (either on input or on reuse later), or possibly security breaches.

Eliacom's MySQL GUI Tool provides you with simple, straight-forward data validation. We hope you found this white paper useful. Please let us know if you have any questions you felt were not addressed in the white paper or if you have any feedback: Contact Us