Byte order mark

The byte order mark is a particular usage of the special Unicode character,, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text:

The byte order, or endianness, of the text stream in the cases of 16-bit and 32-bit encodings;
The fact that the text stream's encoding is Unicode, to a high level of confidence;
Which Unicode character encoding is used.

BOM use is optional. Its presence interferes with the use of UTF-8 by software that does not expect non-ASCII bytes at the start of a file but that could otherwise handle the text stream.
Unicode can be encoded in units of 8-bit, 16-bit, or 32-bit integers. For the 16- and 32-bit representations, a computer receiving text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM is encoded in the same scheme as the rest of the document and becomes a non-character Unicode code point if its bytes are swapped. Hence, the process accessing the text can examine these first few bytes to determine the endianness, without requiring some contract or metadata outside of the text stream itself. Generally the receiving computer will swap the bytes to its own endianness, if necessary, and would no longer need the BOM for processing.
The byte sequence of the BOM differs per Unicode encoding, and none of the sequences is likely to appear at the start of text streams stored in other encodings. Therefore, placing an encoded BOM at the start of a text stream can indicate that the text is Unicode and identify the encoding scheme used. This use of the BOM character is called a "Unicode signature".

Usage

If the BOM character appears in the middle of a data stream, Unicode says it should be interpreted as a "zero-width non-breaking space". In Unicode 3.2, this usage is deprecated in favor of the "Word Joiner" character, U+2060. This allows U+FEFF to be used only as a BOM.

UTF-8

The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF.
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM. The standard also does not recommend removing a BOM when it is there, so that round-tripping between encodings does not lose information, and so that code that relies on it continues to work. The IETF recommends that if a protocol either always uses UTF-8, or has some other way to indicate what encoding is being used, then it "SHOULD forbid use of U+FEFF as a signature."
Not using a BOM allows text to be backwards-compatible with some software that is not Unicode-aware. Examples include programming languages that permit non-ASCII bytes in string literals but not at the start of the file.
UTF-8 is a sparse encoding in the sense that a large fraction of possible byte combinations do not result in valid UTF-8 text. Binary data and text in any other encoding are likely to contain byte sequences that are invalid as UTF-8. Practically the only exceptions to that are when the text consists purely of ASCII-range bytes. Because all modern encodings use ASCII-range bytes to represent ASCII characters, ASCII-only text can be safely interpreted as UTF-8 regardless of what encoding was intended by the system that emitted the bytes. Because of these considerations, heuristic analysis can detect with high confidence whether UTF-8 is in use, without requiring a BOM.
Microsoft compilers and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII. Windows PowerShell will add a BOM when it saves UTF-8 XML documents. However, PowerShell Core 6 has added a -Encoding switch on some cmdlets called utf8NoBOM so that document can be saved without BOM. Google Docs also adds a BOM when converting a document to a plain text file for download.

UTF-16

In UTF-16, a BOM may be placed as the first character of a file or character stream to indicate the endianness of all the 16-bit code units of the file or stream. If an attempt is made to read this stream with the wrong endianness, the bytes will be swapped, thus delivering the character U+FFFE, which is defined by Unicode as a "non character" that should never appear in the text.

If the 16-bit units are represented in big-endian byte order, the BOM will appear in the sequence of bytes as 0xFE 0xFF
If the 16-bit units use little-endian order, the BOM will appear in the sequence of bytes as 0xFF 0xFE

Neither of these sequences is valid UTF-8, so their presence indicates that the file is not encoded in UTF-8.
For the IANA registered charsets UTF-16BE and UTF-16LE, a byte order mark should not be used because the names of these character sets already determine the byte order. If encountered anywhere in such a text stream, U+FEFF is to be interpreted as a "zero width no-break space".
If there is no BOM, it is possible to guess whether the text is UTF-16 and its byte order by searching for ASCII characters. A large number in the same order is a very good indication of UTF-16 and whether the 0 is in the even or odd bytes indicates the byte order. However, this can result in both false positives and false negatives.
Clause D98 of conformance of the Unicode standard states, "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian." Whether or not a higher-level protocol is in force is open to interpretation. Files local to a computer for which the native byte ordering is little-endian, for example, might be argued to be encoded as UTF-16LE implicitly. Therefore, the presumption of big-endian is widely ignored. The W3C/WHATWG encoding standard used in HTML5 specifies that content labelled either "utf-16" or "utf-16le" are to be interpreted as little-endian "to deal with deployed content". However, if a byte-order mark is present, then that BOM is to be treated as "more authoritative than anything else".
Programs that interpret UTF-16 as a byte-based encoding may display a garbled mess of characters, but ASCII characters would be recognizable because the low byte of the UTF-16 representation is the same as the ASCII code and therefore would be displayed the same. The upper byte of 0 may be displayed as nothing, white space, a period, or some other unvarying glyph.

UTF-32

Although a BOM could be used with UTF-32, this encoding is rarely used for transmission. Otherwise the same rules as for UTF-16 are applicable.
The BOM for little-endian UTF-32 is the same pattern as a little-endian UTF-16 BOM followed by a NUL character, an unusual example of the BOM being the same pattern in two different encodings. Programmers using the BOM to identify the encoding will have to decide whether UTF-32 or a NUL first character is more likely.

Byte order marks by encoding

This table illustrates how the BOM character is represented as a byte sequence in various encodings and how those sequences might appear in a text editor that is interpreting each byte as a legacy encoding :

Encoding	Representation	Representation	Bytes as CP1252 characters
UTF-8	`EF BB BF`	`239 187 191`	`ï»¿`
UTF-16	`FE FF`	`254 255`	`þÿ`
UTF-16	`FF FE`	`255 254`	`ÿþ`
UTF-32	`00 00 FE FF`	`0 0 254 255`	`^@^@þÿ`
UTF-32	`FF FE 00 00`	`255 254 0 0`	`ÿþ^@^@`
UTF-7	`2B 2F 76 38` `2B 2F 76 39` `2B 2F 76 2B` `2B 2F 76 2F` `2B 2F 76 38 2D`	`43 47 118 56` `43 47 118 57` `43 47 118 43` `43 47 118 47` `43 47 118 56 45`	`+/v8` `+/v9` `+/v+` `+/v/` `+/v8-`
UTF-1	`F7 64 4C`	`247 100 76`	`÷dL`
UTF-EBCDIC	`DD 73 66 73`	`221 115 102 115`	`Ýsfs`
SCSU	`0E FE FF`	`14 254 255`	`^Nþÿ`
BOCU-1	`FB EE 28`	`251 238 40`	`ûî(`
GB-18030	`84 31 95 33`	`132 49 149 51`	`„1•3`

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...