Base64

In computer science, Base64 is a group of binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Each Base64 digit represents exactly 6 bits of data. Three 8-bit bytes can therefore be represented by four 6-bit Base64 digits.
Common to all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web where its uses include the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.

Design

The particular set of 64 characters chosen to represent the 64 digit-values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean. For example, MIME's Base64 implementation uses A–Z, a–z, and 0–9 for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.
The earliest instances of this type of encoding were created for dial up communication between systems running the same OS — e.g., uuencode for UNIX, BinHex for the TRS-80 — and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.

Base64 table

The Base64 index table:

Examples

The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data ; the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.
Here is a quote from Thomas Hobbes's Leviathan:
Man is distinguished, not only by his reason, but by this singular passion from other animals,
which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable
generation of knowledge, exceeds the short vehemence of any carnal pleasure.
When that quote is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows :
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits are converted into individual numbers from left to right, which are then converted into their corresponding Base64 character values.
As this example illustrates, Base64 encoding converts three octets into four encoded characters.
= padding characters might be added to make the last encoded block contain four Base64 characters.
Hexadecimal to octal transformation is useful to convert between binary and Base64. Both for advanced calculators and programming languages such conversion is available. For example the 24 bits above is 4D616E and converted into octal 23260556, which is divided into four groups 23 26 05 56, which in decimal is 19 22 05 46, which is converted by the table to Base64, in this case TWFu.
If there are only two significant input octets, or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits ; the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding :
If there is only one significant input octet, or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits ; the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding :

Output padding

Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets on a modern computer, every four characters of Base64-encoded text represents three octets of unencoded text or data. This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =, which indicates that no further bits are needed to fully encode the input. The example below illustrates how truncating the input of the above quote changes the output padding:
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception where padding characters are required is when multiple Base64 encoded files have been concatenated.
Another consequence of the sextet encoding of octets is that the same octet will be encoded differently depending on its position within a three-octet group of the input, and depending on which particular octet precedes it within the group. For example:

Input	Output
plea
lea
ea
a

As the eight bits of an octet are spread across multiple sextets within the output, this is an obvious consequence, since no octet can be stuffed into a single sextet; instead they must share.
However, since the sextets or characters of the output must be saved and manipulated on the same computer system, which only understands octets, they must be represented as octets, with the upper two bits set to zero. Indeed, these supposedly wasted bits are exactly the reason for the Base64 encoding. The ratio of output bytes to input bytes is 4:3. Specifically, given an input of n bytes, the output will be bytes long, including padding characters.

Decoding Base64 with padding

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while indicates that the four characters will decode to only a single byte. For example:

Encoded	Padding	Length	Decoded
		1	any carnal plea
	`=`	2	any carnal plea
		3	any carnal plea

Decoding Base64 without padding

Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation only two or three characters shall remain. A single remaining encoded character is not possible For example:

Length	Encoded	Length	Decoded
2		1	any carnal plea
3		2	any carnal plea
4		3	any carnal plea

Implementations and history

Variants summary table

Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the index table for index 62 and 63, and the character used for padding. The table below summarizes these known variants, and link to the subsections below.

Privacy-enhanced mail

The first known standardized use of the encoding now called MIME Base64 was in the Privacy-enhanced Electronic Mail protocol, proposed by RFC 989 in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.
The current version of PEM uses a 64-character alphabet consisting of upper- and lower-case Roman letters, the numerals, and the + and / symbols. The = symbol is also used as a padding suffix. The original specification, RFC 989, additionally used the * symbol to delimit encoded but unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode, the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.
The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two = characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one = character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local conventions.

MIME

The MIME specification lists Base64 as one of two binary-to-text encoding schemes. MIME's Base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM, and uses the = symbol for output padding in the same way, as described at RFC 2045.
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally it specifies that any extra-alphabetic characters must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.
Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes. The size of the decoded data can be approximated with this formula:
bytes = / 1.37

UTF-7

, described first in RFC 1642, which was later superseded by RFC 2152, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.
The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers, and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

OpenPGP

, described in RFC 4880, describes Radix-64 encoding, also known as "ASCII armor". Radix-64 is identical to the "Base64" encoding described from MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by "=" symbol as separator, appended to the encoded output data.

RFC 3548

RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 and Base16 encodings.
Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.

https://tools.ietf.org/html/rfc4648#section-5 RFC 4648

This RFC obsoletes RFC 3548 and focuses on Base64/32/16:

Filenames

Another variant called modified Base64 for filename uses '-' instead of '/', because Unix and Windows filenames cannot contain '/'.
It could be recommended to use the modified Base64 for URL instead, since then the filenames could be used in URLs also.

URL applications

Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.
Using standard Base64 in URL requires encoding of '+', '/' and '=' characters into special percent-encoded hexadecimal sequences, which makes the string unnecessarily longer.
For this reason, modified Base64 for URL variants exist, where the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. Some variants allow or require omitting the padding '=' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries will encode '=' to '.', potentially exposing applications to relative path attacks when a folder name is encoded from user data.

HTML

The atob and btoa JavaScript methods, defined in the HTML5 draft specification, provide Base64 encoding and decoding functionality to web pages. The btoa method outputs padding characters, but these are optional in the input of the atob method.

Other applications

Base64 can be used in a variety of contexts:

Base64 can be used to transmit and store text that might otherwise cause delimiter collision
Spammers use Base64 to evade basic anti-spamming tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
Base64 is used to encode character strings in LDIF files
Base64 is often used to embed binary data in an XML file, using a syntax similar to … e.g. favicons in Firefox's exported bookmarks.html.
Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
The FreeSWAN IPSec implementation precedes Base64 strings with 0s, so they can be distinguished from text or hexadecimal strings.
Although not part of the official specification for SVG, some viewers can interpret Base64 when used for embedded elements, such as images inside SVG.
Radix-64 applications not compatible with Base64
Uuencoding, traditionally used on UNIX, uses ASCII 32 through 95, consecutively, making its 64-character set " !"#$%&'*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_". Avoiding all lower-case letters was helpful because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power because it was only necessary to add 32, not do a lookup. Its use of most punctuation characters and the space character limits its usefulness.
BinHex 4, which was used within the classic Mac OS, uses a different set of 64 characters. It uses upper and lower case letters, digits, and punctuation characters, but does not use some visually confusable characters like '7', 'O', 'g' and 'o'. Its 64-character set is "!"#$%&'*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZcrypt in the /etc/passwd file using radix-64 encoding called B64. It uses a mostly-alphanumeric set of characters, plus . and /. Its 64-character set is "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". Padding is not used.
* The [GEDCOM">crypt (C)">crypt in the /etc/passwd file using radix-64 encoding called B64. It uses a mostly-alphanumeric set of characters, plus . and /. Its 64-character set is "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz". Padding is not used.
* The [GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format using radix-64. Its 64-character set is also "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
* bcrypt hashes are designed to be used in the same way as traditional crypt hashes, and the algorithm uses a similar but permuted alphabet. Its 64-character set is "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".
* Xxencoding uses a mostly-alphanumeric character set similar to crypt and GEDCOM, but using + and - rather than . and /. Its 64-character set is "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
* 6PACK, used with some terminal node controllers, uses a different set of 64 characters from 0x00 to 0x3f.
* Bash supports numeric literals in base 2-64, stretching to a character set of 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_.


   
    

Popular articles
Javier Milei - Argentine libertarian economist, author, radio conductor and public speaker sympathetic to the Austrian School of economic thought. He became widely known for his regular ...
Jimmy Carter - American politician, philanthropist, and former farmer who served as the 39th president of the United States from 1977 to 1981. A member of the Democratic Party, he previ...
UEFA Euro 2024 - The  2024 UEFA European Football Championship , commonly referred to as  UEFA Euro 2024  or simply  Euro 2024 , will be the 17th edition of the UEFA European Championship, the quadrennial internationa...
Argentina - country located mostly in the southern half of South America. Sharing the bulk of the Southern Cone with Chile to the west, the country is also b...
Sam Altman - American entrepreneur, investor, programmer, and blogger. He is the former president of Y Combinator and now the CEO of OpenAI. Early life and education.  ...
Rosalynn Carter - American who served as First Lady of the United States from 1977 to 1981 as the wife of President Jimmy Carter. For decades, she has been a leading advocate for numerou...
Next Argentine presidential election -  Next Argentine presidential election  - presidential election in Argentina....
Popular movies
The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) -  Killers of the Flower Moon  - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) -  Five Nights at Freddy's  - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....
Popular video games
Minecraft - sandbox video game developed by Mojang Studios. Created by Markus "Notch" Persson in the Java programming language and released as a public alpha for personal computers in 2...
Grand Theft Auto V - 2013 action-adventure game developed by Rockstar North and published by Rockstar Games. It is the first main entry in the  Grand Theft Auto  series since 2008's  Grand Theft ...
Roblox - online game platform and game creation system that allows users to program games and play games created by other users. Founded by David Baszucki and Erik Cassel in 2004 and released in...
Baldur's Gate III - upcoming role-playing video game developed and published by Larian Studios for Microsoft Windows and the Stadia streaming service. It is the third main game in the  Baldur's ...
Alan Wake - action-adventure video game developed by Remedy Entertainment and published by Microsoft Studios, released for the Xbox 360 and Microsoft Windows. The story follows best-selling thri...
Fortnite - online video game developed by Epic Games and released in 2017. It is available in three distinct game mode versions that otherwise share the same general gameplay and game engine:   ...
Super Mario RPG - is a role-playing video game developed by Square and published by Nintendo for the Super Nintendo Entertainment System in 1996. It was directed by Yoshihiko Maekawa and Chihiro Fujioka and produced by...
Popular books
Book of Revelation - The  Book of Revelation  is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text:  apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The  Gospel According to Matthew  is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide -  Michelin Guides  are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin  Red Guide , the oldest...
Psalms - The  Book of Psalms , commonly referred to simply as  Psalms ,  the Psalter  or "the Psalms", is the first book of the  Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes -  Ecclesiastes   is one of 24 books of the  Tanakh , where it is classified as one of the  Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...
Popular television series
The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to  The Big Bang Theory  and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...