PDF417


PDF417 is a stacked linear barcode format used in a variety of applications such as transport, identification cards, and inventory management. "PDF" stands for Portable Data File. The "417" signifies that each pattern in the code consists of 4 bars and spaces in a pattern that is 17 units long. The PDF417 symbology was invented by Ynjiun P. Wang at Symbol Technologies in 1991. It is defined in ISO standard 15438.

Applications

PDF417 is used in many applications by both commercial and government organizations. PDF417 is one of the formats that can be used to print postage accepted by the United States Postal Service. PDF417 is also used by the airline industry's Bar Coded Boarding Pass standard as the 2D bar code symbolism for paper boarding passes. PDF417 is the standard selected by the Department of Homeland Security as the machine readable zone technology for RealID compliant driver licenses and state issued identification cards. PDF417 barcodes are also included on visas and border crossing cards issued by the State of Israel.

Features

In addition to features typical of two dimensional bar codes, PDF417's capabilities include:
The introduction of the ISO/IEC document states:

Format

The PDF417 bar code consists of 3 to 90 rows, each of which is like a small linear bar code. Each row has:
All rows are the same width; each row has the same number of codewords.

Codewords

PDF417 uses a base 929 encoding. Each codeword represents a number from 0 to 928.
The codewords are represented by patterns of dark and light regions. Each of these patterns contains four bars and four spaces. The total width is 17 times the width of the narrowest allowed vertical bar ; this is where the 17 in the name comes from. Each pattern starts with a bar and ends with a space.
The row height must be at least 3 times the minimum width: Y ≥ 3 X.
There are three distinct bar-space patterns used to represent each codeword. These patterns are organized into three groups known as clusters. The clusters are labeled 0, 3, and 6. No bar-space pattern is used in more than one cluster. The rows of the symbol cycle through the three clusters, so row 1 uses patterns from cluster 0, row 2 uses cluster 3, row 3 uses cluster 6, and row 4 again uses cluster 0.
Which cluster can be determined by an equation:
Where K is the cluster number and the bi refer to the width of the i-th black bar in the symbol character.
Alternatively,
Where Ei is the i-th edge-to-next-same-edge distance. Odd indices are the leading edge of a bar to the leading edge of the next bar; even indices are for the trailing edges.
One purpose of the three clusters is to determine which row the codeword is in. The clusters allow portions of the symbol to be read using a single scan line that may be skewed from the horizontal. For instance, the scan might start on row 6 at the start of the row but end on row 10. At the beginning of the scan, the scanner sees the constant start pattern, and then it sees symbols in cluster 6. When the skewed scan straddles rows 6 and 7, then the scanner sees noise. When the scan is on row 7, the scanner sees symbols in cluster 0. Consequently, the scanner knows the direction of the skew. By the time the scanner reaches the right, it is on row 10, so it sees cluster 0 patterns. The scanner will also see a constant stop pattern.

Encoding

Of the 929 available code words, 900 are used for data, and 29 for special functions, such as shifting between major modes. The three major modes encode different types of data in different ways, and can be mixed as necessary within a single bar code:
When the PDF417 symbol is created, from 2 to 512 error detection and correction codewords are added. PDF417 uses Reed-Solomon error correction. When the symbol is scanned, the maximum number of corrections that can be made is equal to the number of codewords added, but the standard recommends that two codewords be held back to ensure reliability of the corrected information.

Comparison with other symbologies

PDF417 is a stacked barcode that can be read with a simple linear scan being swept over the symbol. Those linear scans need the left and right columns with the start and stop code words. Additionally, the scan needs to know what row it is scanning, so each row of the symbol must also encode its row number. Furthermore, the reader's line scan won't scan just a row; it will typically start scanning one row, but then cross over to a neighbor and possibly continuing on to cross successive rows. In order to minimize the effect of these crossings, the PDF417 modules are tall and narrow — the height is typically three times the width. Also, each code word must indicate which row it belongs to so crossovers, when they occur, can be detected. The code words are also designed to be delta-decodable, so some code words are redundant. Each PDF data code word represents about 10 bits of information, but the printed code word is 17 modules wide. Including a height of 3 modules, a PDF417 code word takes 51 square modules to represent 10 bits. That area does not count other overhead such as the start, stop, row, format, and ECC information.
Other 2D codes, such as DataMatrix and QR, are decoded with image sensors instead of uncoordinated linear scans. Those codes still need recognition and alignment patterns, but they do not need to be as prominent. An 8 bit code word will take 8 square modules.
In practice, a PDF417 symbol takes about four times the area of a DataMatrix or QR Code.