Augmented Backus–Naur form
In computer science, augmented Backus–Naur form is a metalanguage based on Backus–Naur form, but consisting of its own syntax and derivation rules. The motive principle for ABNF is to describe a formal system of a language to be used as a bidirectional communications protocol. It is defined by , which is, and it often serves as the definition language for IETF communication protocols.
supersedes . RFC 7405 updates it, adding a syntax for specifying case-sensitive string literals.
Overview
An ABNF specification is a set of derivation rules, written asrule = definition ; comment CR LF
where rule is a case-insensitive nonterminal, the definition consists of sequences of symbols that define the rule, a comment for documentation, and ending with a carriage return and line feed.
Rule names are case-insensitive:
,
,
, and
all refer to the same rule. Rule names consist of a letter followed by letters, numbers, and hyphens.Angle brackets are not required around rule names. However, they may be used to delimit a rule name when used in prose to discern a rule name.
Terminal values
are specified by one or more numeric characters.Numeric characters may be specified as the percent sign
%
, followed by the base, followed by the value, or concatenation of values. For example, a carriage return is specified by %d13
in decimal or %x0D
in hexadecimal. A carriage return followed by a line feed may be specified with concatenation as %d13.10
.Literal text is specified through the use of a string enclosed in quotation marks. These strings are case-insensitive, and the character set used is ASCII. Therefore, the string
"abc"
will match “abc”, “Abc”, “aBc”, “abC”, “ABc”, “AbC”, “aBC”, and “ABC”. RFC 7405 added a syntax for case-sensitive strings: %s"aBc"
will only match "aBc". Prior to that, a case-sensitive string could only be specified by listing the individual characters: to match “aBc”, the definition would be %d97.66.99
. A string can also be explicitly specified as case-insensitive with a %i
prefix.Operators
White space
White space is used to separate elements of a definition; for space to be recognized as a delimiter, it must be explicitly included. The explicit reference for a single whitespace character isWSP
, and LWSP
is for zero or more whitespace characters with newlines permitted. The LWSP
definition in RFC5234 is controversial because at least one whitespace character is needed to form a delimiter between two fields.Definitions are left-aligned. When multiple lines are required, continuation lines are indented by whitespace.
Comment
; comment
A semicolon starts a comment that continues to the end of the line.
Concatenation
Rule1 Rule2
A rule may be defined by listing a sequence of rule names.
To match the string “aba”, the following rules could be used:
Alternative
Rule1 / Rule2
A rule may be defined by a list of alternative rules separated by a solidus.
To accept the rule fu or the rule bar, the following rule could be constructed:
Incremental alternatives
Rule1 =/ Rule2
Additional alternatives may be added to a rule through the use of
=/
between the rule name and the definition.The rule
Value range
%c##-##
A range of numeric values may be specified through the use of a hyphen.
The rule
Sequence group
Elements may be placed in parentheses to group rules in a definition.
To match “elem fubar snafu” or “elem tarfu snafu”, the following rule could be constructed:
Variable repetition
n*nRule
To indicate repetition of an element, the form
<a>*<b>element
is used. The optional <a>
gives the minimal number of elements to be included. The optional <b>
gives the maximal number of elements to be included.Use
*element
for zero or more elements, *1element
for zero or one element, 1*element
for one or more elements, and 2*3element
for two or three elements, cf. regular expressions e*
, e?
, e+
and e
.Specific repetition
nRule
To indicate an explicit number of elements, the form
<a>element
is used and is equivalent to <a>*<a>element
.Use
2DIGIT
to get two numeric digits, and 3DIGIT
to get three numeric digits.Optional sequence
To indicate an optional element, the following constructions are equivalent:
Operator precedence
- Strings, names formation
- Comment
- Value range
- Repetition
- Grouping, optional
- Concatenation
- Alternative
Core rules
The core rules are defined in the ABNF standard.Rule | Formal definition | Meaning |
ALPHA | %x41-5A / %x61-7A | Upper- and lower-case ASCII letters |
DIGIT | %x30-39 | Decimal digits |
HEXDIG | DIGIT / "A" / "B" / "C" / "D" / "E" / "F" | Hexadecimal digits |
DQUOTE | %x22 | Double quote |
SP | %x20 | Space |
HTAB | %x09 | Horizontal tab |
WSP | SP / HTAB | Space and horizontal tab |
LWSP | * | Linear white space |
VCHAR | %x21-7E | Visible characters |
CHAR | %x01-7F | Any ASCII character, excluding NUL |
OCTET | %x00-FF | 8 bits of data |
CTL | %x00-1F / %x7F | Controls |
CR | %x0D | Carriage return |
LF | %x0A | Linefeed |
CRLF | CR LF | Internet-standard newline |
BIT | "0" / "1" | Binary digit |
Example
The postal address example given in the augmented Backus–Naur form page may be specified as follows:postal-address = name-part street zip-part
name-part = * last-name CRLF
name-part =/ personal-part CRLF
personal-part = first-name /
first-name = *ALPHA
initial = ALPHA
last-name = *ALPHA
suffix =
street = house-num SP street-name CRLF
apt = 1*4DIGIT
house-num = 1*8
street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF
town-name = 1*
state = 2ALPHA
zip-code = 5DIGIT