Data validation


In computer science, data validation is the process of ensuring data have undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful. It uses routines, often called "validation rules", "validation constraints", or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. The rules may be implemented through the automated facilities of a data dictionary, or by the inclusion of explicit application program validation logic of the computer and its application.

Overview

Data validation is intended to provide certain well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system. Data validation rules can be defined and designed using any of various methodologies, and be deployed in any of various contexts.
Data validation rules may be defined, designed and deployed, for example:
Definition and design contexts:
Depts:
For business applications, data validation can be defined through declarative data integrity rules, or procedure-based business rules. Data that does not conform to these rules will negatively affect business process execution. Therefore, data validation should start with business process definition and set of business rules within this process. Rules can be collected through the requirements capture exercise.

Different kinds

In evaluating the basics of data validation, generalizations can be made regarding the different types of validation, according to the scope, complexity, and purpose of the various validation operations to be carried out.
For example:
Data type validation is customarily carried out on one or more simple data fields.
The simplest kind of data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types; as defined in a programming language or data storage and retrieval mechanism as well as the specification of the following primitive data types: 1) integer; 2) float string.
For example, many database systems allow the specification of the following l. A more sophisticated data validation routine would check to see the user had entered a valid country code, i.e., that the number of digits entered matched the convention for the country or area specified.
A validation process involves two distinct steps: Validation Check and Post-Check action. The check step uses one or more computational rules to determine if the data is valid. The Post-validation action sends feedback to help enforce validation.

Simple range and constraint check

Simple range and constraint validation may examine user input for consistency with a minimum/maximum range, or consistency with a test for evaluating a sequence of characters, such as one or more tests against regular expressions. For example, a US phone number should have 10 digits and no letters or special characters.

Code and cross-reference check

Code and cross-reference validation includes tests for data type validation, combined with one or more operations to verify that the user-supplied data is consistent with one or more external rules, requirements, or validity constraints relevant to a particular organization, context or set of underlying assumptions. These additional validity constraints may involve cross-referencing supplied data with a known look-up table or directory information service such as LDAP.
For example, an experienced user may enter a well-formed string that matches the specification for a valid e-mail address, as defined in RFC 5322 but that well-formed string might not actually correspond to a resolvable domain connected to an active e-mail account.

Structured check

Structured validation allows for the combination of any of various basic data type validation steps, along with more complex processing. Such complex processing may include the testing of conditional constraints for an entire complex data object or set of process operations within a system.
A Validation rule is a criterion or constraint used in the process of data validation, carried out after the data has been encoded onto an input medium and involves a data vet or validation program. This is distinct from formal verification, where the operation of a program is determined to be that which was intended, and that meets the purpose. The Validation rule or check system still used by many major software manufacturers was designed by an employee at Microsoft sometime between 1997 and 1999.
The method is to check that data follows the appropriate parameters defined by the systems analyst. A judgement as to whether data is valid is made possible by the validation program, but it cannot ensure complete accuracy. This can only be achieved through the use of all the clerical and computer controls built into the system at the design stage. The difference between data validity and accuracy can be illustrated with a trivial example. A company has established a Personnel file and each record contains a field for the Job Grade. The permitted values are A, B, C, or D. An entry in a record may be valid and accepted by the system if it is one of these characters, but it may not be the correct grade for the individual worker concerned. Whether a grade is correct can only be established by clerical checks or by reference to other files. During systems design, therefore, data definitions are established which place limits on what constitutes valid data. Using these data definitions, a range of software validation checks can be carried out.

Consistency check

Consistency check ensures that the entered data is logical. For example the delivery date cannot be before the order date.

Range check

An example of a validation check is the procedure used to verify an ISBN.
;Allowed character checks
;Batch totals
;Cardinality check
;Check digits
;Consistency checks
;Control totals
;Cross-system consistency checks
;Data type checks
;File existence check
;Format or picture check
;Hash totals
;Limit check
;Logic check
;Presence check
;Range check
;Referential integrity
;Spelling and grammar check
;Uniqueness check
;Table look up check

Post-validation actions

;Enforcement Action
;Advisory Action
;Verification Action
;Log of validation

Validation and security

Failures or omissions in data validation can lead to data corruption or a security vulnerability. Data validation checks that data are fit for purpose, valid, sensible, reasonable and secure before they are processed.