N U T S H E L L | |
Product | PageValidator |
---|---|
Description | client and server input validation engine |
Language | Perl, JavaScript |
Interface | GUI/browser, command-line |
Platform | any Perl-supported (Windows,Linux,...) |
Reference | Perl JavaScript sandbox example |
Any program or website that queries the user for information requires input validation. Or, to put that another way, every non-trivial program requires input validation. So what is input validation? It is the process of ensuring that you have acquired what you expected. If you asked for a credit card number and you received a name, that would prohibit your transaction from proceeding. If you need a new user to enter a user id (since that is your database key field), the user leaving that field blank is invalid. If you are accepting input to make an airline reservation, including a start date and an end date, it is quite unlikely(!) that one could actually have an end date earlier than a start date (excepting of course on a time-travel website!).
Why is it important to validate your program inputs?
The CleanCode validation engine PageValidator is unique -- it allows you to validate input on the front-end AND on the back-end with the SAME engine and can therefore cover all of these cases. Before we describe some of the features and benefits of PageValidator, however, let's examine the process of validation a bit further. Towards the bottom of the page you'll find discussion of the sandbox and example that let you try out PageValidator live.
Validation may be as simple as, for example, asking the user to enter a name and then checking that he/she actually entered a name, i.e. that the user did not enter nothing. More generally, one accepts one or more inputs, validates those inputs, and then processes the inputs if they are acceptable. If not, one typically may ask the user to correct the inputs. This process is illustrated in the figure at right.
#1# $name = ""; #2# while ($name eq "") { #3# print "Name ? "; #4# $name = <STDIN>; #5# chomp($name); #6# } #7# print "Thank you, $name\n";
To go from the abstract to the concrete, the code fragment shown is one of the simplest one could create in Perl (Perl obfuscators know how to make this verbose program shorter, mind you.) The line numbers are for discussion; they are not part of the code. The steps in this program are in a slightly different order than the flowchart. Input arrives on line 4, is validated on line 2, and processed on line 7, but it still conforms to the model in the flowchart.
That seems straightforward and clean enough. Now consider if we want to accept as part of an address a state abbreviation ("WA", "OR", "AL", etc.). The next code sample shows a JavaScript function to check the value of this field.
function check(state) { return (state == "WA" || state == "OR" || state == "AL" || state == "AK" || state == "CA" || state == "CO" || . . .); }
The list of US states does not change very often (lately once every 50 years or so) so this would seem to be a safe piece of code. But what if this is some other long list (e.g. part numbers) that could change perhaps next quarter or even next week. That would require editing the check function above to accommodate the updated list of values. Routine data changes routinely forcing code changes, such as this, is a very messy business and should be avoided at all costs.
Now one could argue that it does not make sense for a user to type a value for a US state, or even for a part number, but that this field should use a drop-down box to force a user to select an entry from a pre-defined list. Then there is no need for the check function at all, right? Not so! Recall the list of reasons to validate provided above. The two examples presented so far both fall under category (1), alerting the user to inadvertent errors. But consider where the validation occurs. In a web transaction, validation may occur on the client or on the server. If the validation is on the server, both examples can also fall under categories (2) and (3). It is simple indeed to alter the data fed from a web page back to a web server. So even if a drop-down box is used, one could force the state field to have the value E8 instead of AK, for example. And one could just as simply disable validation on the client. Also, though rare nowadays, it is still possible that a packet on the network could be corrupted for a variety of reasons. So if the value in the state field had a single bit changed, the string could go from "AK" to "AJ", a value that is not an abbreviation for a US state. So it would seem that validation on the server has the advantage as it could check user errors, malicious errors, and some network errors. Validation on the client side, however, has a fairly significant advantage -- response time. Upon filling in a form, a user could get instantaneous feedback with client-side validation. Even the best network connection would take a second or two for server-side validation, and easily up to 30 seconds in many cases. PageValidator provides the ability to do both server-side and client-side validation so you get all the advantages.
sub check { my $street = shift; return ($street !~ /^(po|p o|post office|box)\s/); }
The next example illustrates a simple business rule validation--ensuring
that the street address is not a post office box.
The power of Perl regular expressions is brought to bear here,
checking for the 4 most likely letter combinations that would suggest
a post office box.
Regular expressions provide a tremendous ability to match patterns
in your input, a heck of a lot easier than doing multiple if-then
comparisons.
And they are also available in JavaScript and, more recently, Java, as well.
Consider then a business rule regarding phone numbers. There are many ways to represent valid US phone numbers (999) 123-4567 or 999-123-4567 or 9991234567 or +1 (999) 123-4567, etc. But for purposes of sorting and consistency any given application will typically want to use a single format for all phone numbers. One may either accept any input and then convert it to a canonical form, or one may simply disallow any inputs that do not already meet the constraints of the canonical form. The first method is generally better for the user (letting the program do the work instead of the user) while the second method is easier for the programmer (forcing the user to do more work by re-entering the number until it is correct!). Interestingly, either of these tasks is easier to accomplish with PageValidator. (So you could view it as a package that does not pass moral judgments:-) This regular expression will accept any of the phone number formats specified above:
(\+1\s+)?\(\d{3}\)\s+\d{3}-\d{4}|\d{3}-\d{3}-\d{4}|\d{10}
That regular expression is simply added to the PageValidator dictionary under the phone number entry to enable validation on it. (By massaging that regular expression just a bit further, it may also be used to convert input in any of the formats into our canonical form automatically.)
Now let us assume that we not only have a phone number, but also have a
fax number, a mobile number, an office number, and an emergency contact
number. All of these should follow the same canonical form.
Conventional code would either have a series of conditional statements checking each phone number, or passing each phone number to a generic
checkCanonicalPhone
function in a series of function calls.
Either way, it requires changes to the code.
PageValidator makes a clear distinction between your code
and your data, specifically using a data dictionary to describe
what is acceptable input for each field.
And entries in the data dictionary may cascade, building upon one another.
For the simple case of phone numbers that we have here, it is sufficient
to define a base phone number entry, and then for all the other myriad numbers
the entries in the data dictionary simply indicate to reference
the base entry.
Therefore, if you later decide to also accept 1-999-123-4567
you simply update the single piece of data in the data dictionary.
Just to finish off the categories of validation introduced at the top of the page, the use of regular expressions also provides an easy way to cover category (5)--filtering the input so that the system does not become corrupted. This could again be inadvertent or willful on the part of the user. Recently I used a Contact us page on a website to ask a question. I happened to use double-quotes in my question. Once I pressed submit the subsequent page was quite askew because the programmer did not allow for double-quotes to be in the input, a character that sometimes must be handled specially on web pages. A common willful attempt to broach security is to enter a statement terminator character followed by, say, a shell command to mail the password file. If the value of that input field happens to be fed to the shell for processing, that could compromise the system security. (This process of removing "bad" characters is called untainting.)
The table below shows the types of constraints you may specify in the data dictionary.
facet | description |
---|---|
string length | check the input against both a minimum and a maximum length (e.g. 7 to 10 characters for a phone number) |
numerical value | check numerical values against upper and lower bounds |
member of a set | check a string against a specific set of permissible values (e.g. a US state abbreviation may only be two-letter combinations defined by the government) |
existence | check whether a required value is present or not (i.e. a field that may not be left blank) |
pattern matching | check the input against regular expressions, as described earlier |
custom | define your own plug-in to cross-check multiple fields or to perform any custom operation you need |
There are probably other validation engines available that provide similar data dictionary capabilities--for a single field. But PageValidator also lets you validate multiple related fields via the data dictionary, using custom plug-ins that you create to indicate the constraints that the group of fields must satisfy. This allows you to generically check things like:
As mentioned earlier, the PageValidator engine works on both the front-end (client) and the back-end (server) of your system. Validate in the browser to give good response time, but that can easily be circumvented so is a security risk. Validate on the server for security, but it is less responsive. Do both for the best of both worlds. PageValidator code is in JavaScript for the browser and in Perl for the server. But they both share a single data dictionary! When you need to update the data you do it exactly once--the system takes care of propagating that change.
If we have whet your appetite, here's your opportunity to see
what the PageValidator engine can do.
Click on the thumbnail at right to open a live sandbox
providing virtually all possible field inputs for an HTML form.
When you press the Validate
button
any problems will be reported in the scrolling text box.
For example, the radio button fields are clearly labeled to show
that the first two will yield violations while either of the last
two are acceptable inputs.
This sandbox provides a few other capabilities via the checkboxes
in the red toolbar next to the Validate
button.
Specifying a diagnostic mask, for example, will allow you to
see how PageValidator works, providing a trace of the process.
The server option allows you the flexibility of choosing to run the
client-side engine (unchecked) or the server-side engine (checked).
The sandbox was designed to exercise the validator, not your browser, so its output is quite primitive, simply filling up a generic text box with a list of all errors when running the client-side engine, or simply listing the errors on a new page when running the server-side engine. The example shown at left takes you out of the sandbox into the real world. This one uses DHTML to provide a snazzier user interface, where customized errors (as specified in the data dictionary) pop-up in the shaded column on the right side of the screen. If you squint, you'll see 2 such errors in the thumbnail here. Click on it to open the live page where you may try it for yourself.
Go to Perl tech docs