Code Snippet

Home » Code Snippets » PHP » Sanitize Database Inputs

Sanitize Database Inputs

1) Function for stripping out malicious bits

<?php
function cleanInput($input) {

  $search = array(
    '@<script[^>]*?>.*?</script>@si',   // Strip out javascript
    '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
    '@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
    '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments
  );

    $output = preg_replace($search, '', $input);
    return $output;
  }
?>

2) Sanitization function

Uses the function above, as well as adds slashes as to not screw up database functions.

<?php
function sanitize($input) {
    if (is_array($input)) {
        foreach($input as $var=>$val) {
            $output[$var] = sanitize($val);
        }
    }
    else {
        if (get_magic_quotes_gpc()) {
            $input = stripslashes($input);
        }
        $input  = cleanInput($input);
        $output = mysql_real_escape_string($input);
    }
    return $output;
}
?>

Usage

<?php
  $bad_string = "Hi! <script src='http://www.evilsite.com/bad_script.js'></script> It's a good day!";
  $good_string = sanitize($bad_string);
  // $good_string returns "Hi! It\'s a good day!"

  // Also use for getting POST/GET variables
  $_POST = sanitize($_POST);
  $_GET  = sanitize($_GET);
?>

Reference URL

Subscribe to The Thread

  1. I use my own functions as follow:

    function text_global($poster) {
      $poster = stripslashes($poster);
      $poster = str_replace(Array("\n", "'", "‘", "’", "′", "“", "”", "„", "″", '"'), Array("", "’", "’", "’", "’", """, """, """, """, """), $poster);
        return $poster;
    }
    
    while (list($Key, $Val) = each($_POST)) {
     if (substr($Key, 0, 4) != "fsk_") {
      if (is_array($Val) === true) {
       while (list($sKey, $sVal) = each($Val)) {
        $Val[$sKey] = text_global($sVal);
       }
       $_POST[$Key] = $Val;
      } else {
       $_POST[$Key] = text_global($Val);
      }
     }
    }

    Where “fsk_” prefix is used for WYSIWYG editor variables. Works perfectly.

    • @iMaxEst: I think you may have missed the point here. Preparing data is just a side issue. Sanitizing data prevents code injection attacks.

      stripslashes() != sanitize()

  2. Really nice functions Chris! Neat way using regular Expressions these snippets will definetly find there way to my library script.

    Thanks a bunch!

  3. Henk

    Why to clean the input from html/script tags?
    You only have to worry about XSS when you prepare the output!

    Protect your database through prepared statements and htmlspecialchars() will care about the output.

  4. jeff

    It seems like a good idea to clean input. Why do I want to store potentially malignant code in my database?

  5. Phil

    What about ASP ? anyone..

  6. These code snippets don’t come through very nicely via RSS. All the line breaks seem to disappear.

  7. amm257

    These are nigh useless and overly complicated; e.g. the html one simply matches anything with “<", so why not make that explicit? Currently that's all that expression does, all this extra stuff merely serves to obfuscate the issue. E.g. the javascript one doesn't work, all I have to do is add a space: "scripthere”, the browser will figure out what I meant, and the script will execute.

    • amm257

      I apologize, whoever wrote this filter did it both right and wrong (wrong because they simply remove it, instead of escaping it, right because it catches it), I’ve cleaned it up with characters escaped by hand, this should work:

      These are nigh useless and overly complicated; e.g. the html one simply matches anything with “<”, so why not make that explicit? Currently that’s all that expression does, all this extra stuff merely serves to obfuscate the issue. E.g. the javascript one doesn’t work, all I have to do is add a space: “< script>scripthere</script>”, the browser will figure out what I meant, and the script will execute.

  8. Mines is pretty small and handy for getting rid of nasty hacking injections

    function clean($text)
    {
    	$text = strip_tags($text);
    	$text = htmlspecialchars($text, ENT_QUOTES);
    
        return ($text); //output clean text
    }
    • <SCRIPT SRC=http://hackers.com/xss.js></SCRIPT>

    • joel

      this is just fantastic! truly smashing. Thank you.

  9. Dyllon

    mine simply strips out the brackets.

    function clean($code)
    {
        $strip = array(
            '<' => '&lt;',
            '>' => '&gt;'
        );
    
        return strtr($code, $strip);
    }
  10. Mark

    Hi,

    We are looking for a consultant who could evaluate our website and check how vulnerable we are to these kind of malicious scripts.

    Any recommendation?

    Thanks,

    Mark

    • Don’t know a consultant but I am reading “pro PHP security – from application security principles to the implementation of XSS defenses” which explains this stuff quite well

    • Mark, dunno if you’re still interested, but I’ve had several years of this line of work. There are several other areas of attacks that I can investigate for you as well. Simply contact me at http://www.matatechconsulting.com/contact/ for more details.

  11. Chris, why use your regexes instead of PHP’s strip_tags(), as suggested by gibigbig? I don’t understand what functionality is added by going that route.

  12. Glad I found. I was just thinking about this yesterday and needed a better fn.

    Is the filter_var() fn any good? ie. filter_var($value, FILTER_SANITIZE_STRING)

  13. Thanks for the tutorial, this is very useful

  14. fred

    The safest way is to parameterise inputs by using classes such as PDO since PHP is a loosely typed language.

    Or simply cast the inputs into the type that you would expect, e.g. Expect an integer? Just put (int) before the input. Type casting is the fastest operation to sanitize numbers.

  15. Andriah

    Hi! It’s a good day!

    I’m just testing how this works?

  16. I think the satinize function needs a few lines of code, since these two values wont get filtered:

    <script src=”js/jquery-1.6.4.min.js” />

    <span><script src=”js/jquery-1.6.4.min.js”</span>

    Thanks for listening

  17. Lol “evilsite.com”

    I really do like the efficiency of this function though.

  18. Nice functions Chris!
    But when I tried them I got this error.

    Warning: mysql_real_escape_string() [function.mysql-real-escape-string]: Access denied for user ''@'localhost' (using password: NO)

    Warning: mysql_real_escape_string() [function.mysql-real-escape-string]: A link to the server could not be established in

    and this is the line of code that is causing the error
    $output = mysql_real_escape_string($input);

    from this Function for stripping out malicious bits

    • You need to form a connection to the database first, before using mysql_real_escape_string() otherwise it will error out.

      I use this function for escaping date going into the database:

      function escape($string = null)
      {
      if (empty($string))
      {
      return FALSE;
      }

      if (function_exists(‘mysql_real_escape_string’))
      {
      return mysql_real_escape_string($string);
      }
      else
      {
      return str_replace(“‘”, “\’”, $string);
      }
      }

      As for tags and XSS, I use something a little more hardcore….when I say hardcore, it basicly strips out everything I don’t want, only allowing what is listed in an array I define in the script:-

      public function input($string = null)
      {
      if (empty($string))
      {
      return ”;
      }

      // Strip out all bbcode
      $string = preg_replace(‘/\[(.*?)\](.*?)\[\/?(.*?)\]/iu’, ‘\\2′, $string);

      // Convert the ok markup to bbcode
      $string = preg_replace(array_keys($this->markup), $this->markup, $string);

      // Strip out all html tags
      $string = preg_replace(‘/\(.*?)\/iu’, ‘\\2′, $string);

      // Run strip tags to make sure we got everything
      $string = strip_tags($string);

      // Replace double quotes with single quotes
      $string = preg_replace(‘/(“)+/u’, “‘”, $string);

      // Matche one or more spaces and replaces it with a single space
      $string = preg_replace(‘/( )+/u’, ‘ ‘, trim($string));

      return trim($string);
      }

      Hope these two functions help others.

      Matt :)

  19. Mark

    $_POST = sanitize($_POST);
    $_GET = sanitize($_GET);

    That is the worst method of sanitizing that I have seen. You should NEVER store back in to the same data stream the sanitized inputs, the main reason is that the script can execute and a secondary submit following almost instantly can change the values of $_POST to something else other than what the inputs have been sanitized for.

    This means any subsequent use of $_POST after the double post hack happens is tainted.

    Storage of the sanitized inputs in to a safe variable is best option.
    Operation of a white list is also a good idea, by only accepting inputs from specific POST fields, you also limit the ability of a POST hack through a variable that you may never be using.

    Also… Strip_tags() exists for the same reason as

    $search = array(
    '@]*?>.*?@si', // Strip out javascript
    '@<[\/\!]*?[^]*?>@si', // Strip out HTML tags
    '@]*?>.*?@siU', // Strip style tags properly
    '@@' // Strip multi-line comments
    );

    exists. The function removes the elements from a string and covers any HTML that can be used to break a server or script.

    My main point here is that $_POST even if you sanitize it, the variable should not be trusted, things can change, what once existed nano seconds ago may not be the case when your program comes to use the variables. Your opening yourselves to trouble if you do not use a safe method of sanitizing and using data streams.

Speak, my friend

At this moment, you have an awesome opportunity* to be the person your mother always wanted you to be: kind, helpful, and smart. Do that, and we'll give you a big ol' gold star for the day (literally).

Posting tips:
  • You can use basic HTML
  • When posting code, please turn all
    < characters into &lt;
  • If the code is multi-line, use
    <pre><code></code></pre>
Thank you,
~ The Management ~