PDF417 (Portable Data File, or PDF-417) was introduced in 1990 by Symbol Technologies (since acquired by Motorola) as a high-capacity, highly secure symbology. Strictly speaking, PDF417 is classified as a stacked, linear barcode (similar in structure to the stacked variants of GS1 Databar), though it is often referred to as a two-dimensional barcode since it represents data using multiple rows and columns like Data Matrix. Unlike most other two-dimensional barcodes though, PDF417 can be scanned with an appropriately configured laser scanner. Indeed, this was one of its primary design goals.
PDF417 barcodes are variable in size and layout, most commonly seen in rectangular configurations. Here are two examples. Both symbols have encoded the exact same data, but are using different layout options:
A PDF417 symbol consists of 3 to 90 stacked rows of bar/space patterns. Each row consists of the following:
A start pattern;
Left row indicator;
1 to 30 data characters;
Right row indicator;
A stop pattern.
The number of data characters in each row is typically referred to as the number of data columns in the symbol. In the examples above, the first symbol has 16 rows and 1 data column, while the second symbol has just 4 rows but 5 data columns. Here are the examples again, diagramed for clarity:
Each symbol character consists of 4 bars and 4 spaces with a total width of 17 modules (hence the origin of 417 in the symbology’s name). The only exception is the stop symbol which is 19 modules. Each bar/space can be from 1 to 6 modules in width. Symbol characters in PDF417 are typically referred to as codewords, especially those used to represent the data portion of the symbol.
The specification for PDF417 requires that a symbol character’s height be at least 3 times the module width, though there are some instances where 2X and even 1X have been used. Laser scanners typically cannot read symbols that use less than 3X for the module height.
PDF417 uses three distinct symbol sets called clusters; within each cluster, each character is assigned a value between 0 and 928. The clusters are numbered 0, 3, and 6. As an example, here is codeword 0 as it would be represented in each of the clusters (greatly enlarged):
The cluster number can actually be derived from the character pattern, using the following formula:
cluster = (b1 – b2 + b3 – b4 + 9) mod 9
b1 = width of the first bar
b2 = width of the second bar
b3 = width of the third bar
b4 = width of the fourth bar
mod 9 = modulo division (the integer remainder of the division)
As an example, here is the equation for the cluster 3 character:
cluster number = (5 – 1 + 1 – 2 + 9) mod 9
cluster number = 12 mod 9 (12 / 9 = 1 remainder 3)
cluster number = 3
Each row of a PDF417 symbol uses symbol characters from a single cluster. The first row of the symbol uses cluster 0; the second row cluster 3; the third row cluster 6; the fourth row begins again at cluster 0. Since all symbol characters in a given row are from the same cluster and adjacent rows always use different clusters, scanning equipment can detect when a scan crosses from one row to another (especially linear scanners like laser).
The first and last characters in a row are the start and stop pattern while the second character and second to last character are the left row indicator and the right row indicator respectively. These values are computed based on the cluster, row number, the total number of rows, the number of data columns, and the error correction level being used (see Error Correction below) using the formulas below:
Left Row Indicator
Right Row Indicator
30x + y
30x + v
30x + z
30x + y
30x + v
30x + z
x = (row number - 1) / 3
y = (number of rows – 1) / 3
z = (error correction level) * 3 + (number of rows – 1) mod 3
v = number of data columns – 1
Like many high capacity barcode symbologies, PDF417 uses sophisticated data compaction techniques; there is no one-to-one correspondence between the data characters stored in the barcode and the bar/space patterns of the symbol characters used.
Generally, data can be encoded in one of three modes:
Text Compaction Mode– allows for two alphanumeric characters to be stored in each codeword. Text Compaction Mode has four sub-modes: alpha, lower, mixed, and punctuation (essentially 4 different character sets).
Byte Compaction Mode– allows for groups of 6 bytes to be stored in 5 codewords using a base 256 to base 900 conversion. Byte Compaction Mode is typically used for binary data. This is the least efficient data compaction mode of PDF417.
Numeric Compaction Mode– using a base 10 to base 900 conversion, this mode can pack almost 3 numeric digits into a single codeword. This mode is most effective for strings of 13 or more numeric digits. Numeric strings of fewer than 13 digits are typically encoded using Text Compaction Mode.
When a PDF417 symbol is created, complex software algorithms are used to analyze the data to be encoded and use the most appropriate compaction mode, often times switching between modes within the symbol. Other barcode symbologies like Data Matrix, Aztec, and QR Code (and even Code 128 in a much simpler fashion) use this type of approach.
A single PDF417 symbol, using the lowest error correction level, can store up to 1850 text characters, 2710 numeric digits, or 1108 bytes of binary data.
Error Correction Codewords
Like its two-dimensional barcode contemporaries, PDF417 uses Reed-Solomon error correction. Error correction codewords (ECC) are added to the barcode to correct erasures (where a codeword’s position is known but it’s not decodable) and errors (where the position and value of a codeword are unknown); the more ECC added to a symbol, the more erasures and errors that can be corrected.
PDF417 uses 9 levels of ECC, with the lowest level (0) adding two ECC codewords. The table below lists the number of error correction codewords for each level:
Error Correction Level
Number of Error Correction Codewords
Truncated PDF417 is a variant of PDF417 that eliminates the right row indicator and reduces the stop pattern to a single module bar. Truncated PDF417 symbols are thus smaller but also more susceptible to misreads. Truncated PDF417 should only be used in environments where the symbol is not likely to be damaged.
Here are our examples again, formatted as Truncated PDF417 symbols:
Macro PDF417 is a data extension to PDF417 that allows multiple symbols to be concatenated together by a reader to form a single, larger message. A special control block is added to the data of a series of linked PDF417 symbols; the control block contains a segment index, a file ID, and some optional information like file name, time stamp, file size, sender, and addressee. Up to 99,999 PDF417 symbols can be concatenated to form a single message (file).
Why use PDF417?
PDF417 best lends itself to applications where high data capacity and density are required without sacrificing data integrity. PDF417 is one of the few two-dimensional barcodes that can be read by laser scanners.
Who uses PDF417?
PDF417 has found its way into a number of industries and adopted standards. Here are just a few:
FedEx uses PDF417 on its shipping labels for package delivery information;
PDF417 can be used to print postage for the US Postal Service;
In the US, most Department of Motor Vehicles have adopted the AAMVA (American Association of Motor Vehicle Administrators) standard for driver’s license and identification cards which includes a standardized PDF417 barcode that can be scanned and processed by any compliant state;
The Bar-Coded Boarding Pass (BCBP) is a standard used by more than 200 airlines which standardizes the use of PDF417 on printed airline boarding passes.
How do we help developers using PDF417?
The Cognex Mobile Barcode Scanner SDK supports PDF417 detection, enabling developers to gather relevant workflow and application data. You can download the SDK for free by registering on the Cognex Mobile Barcode Developer Network. In addition, the Barcode Scanner SDK supports a broad range of symbologies to meet your growing development needs.