DjVu

DjVu is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal images. This allows high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.
DjVu has been promoted as providing smaller files than PDF for most scanned documents. The DjVu developers report that color magazine pages compress to 40–70 kB, black-and-white technical papers compress to 15–40 kB, and ancient manuscripts compress to around 100 kB; a satisfactory JPEG image typically requires 500 kB. Like PDF, DjVu can contain an OCR text layer, making it easy to perform copy and paste and text search operations.
Free creators, manipulators, converters, Web browser plug-ins, and desktop viewers are available. DjVu is supported by a number of multi-format document viewers and e-book reader software on Linux, Windows, Android.

History

The DjVu technology was originally developed by Yann LeCun, Léon Bottou, Patrick Haffner, and Paul G. Howard at AT&T Labs from 1996 to 2001.
Prior to the standardization of PDF in 2008, DjVu had been considered superior due to it being an open file format in contrast to the proprietary nature of PDF at the time. The declared higher compression ratio, and the claimed ease of converting large volumes of text into DjVu format, were other arguments for DjVu's superiority over PDF in the technology landscape of 2004. Independent technologist Brewster Kahle in a 2004 talk on IT Conversations discussed the benefits of allowing easier access to DjVu files.
The DjVu library distributed as part of the open-source package DjVuLibre has become the reference implementation for the DjVu format. DjVuLibre has been maintained and updated by the original developers of DjVu since 2002.
The DjVu file format specification has gone through a number of revisions, the most recent being from 2005.

Support status	Version	Release date	Notes
Unsupported	1–19	1996–1999	Developmental versions by AT&T labs preceding the sale of the format to LizardTech.
Unsupported	Version 20	April 1999	DjVu version 3. DjVu changed from a single-page format to a multipage format.
Older, still supported	Version 21	September 1999	Indirect storage format replaced. The searchable text layer was added.
Older, still supported	Version 22	April 2001	Page orientation, color JB2
Unsupported	Version 23	July 2002	CID chunk
Unsupported	Version 24	February 2003	LTAnno chunk
Older, still supported	Version 25	May 2003	NAVM chunk. Support for DjVu bookmarks was added. Changes made by Versions 23 and 24 were made obsolete.
Current	Version 26	April 2005	Text/line annotations

Role in the software ecosystem

The primary usage of the DjVu format has been the electronic distribution of documents with a quality comparable to that of printed documents. As that niche is also the primary usage for PDF, it was inevitable that the two formats would become competitors. It should however be observed that the two formats approach the problem of delivering high resolution documents in very different ways: PDF primarily encodes graphics and text as vectorised data, whereas DjVu primarily encodes them as pixmap images. This means PDF places the burden of rendering the document on the reader, whereas DjVu places that burden on the creator.
During a number of years, significantly overlapping with the period when DjVu was being developed, there were no PDF viewers for free operating systems — a particular stumbling block was the rendering of vectorised fonts, which are essential for combining small file size with high resolution in PDF. Since displaying DjVu was a simpler problem for which free software was available, there were suggestions that the free software movement should employ DjVu instead of PDF for distributing documentation; rendering for creating DjVu is in principle not much different from rendering for a device-specific printer driver, and DjVu can as a last resort be generated from scans of paper media. However when FreeType 2.0 in 2000 began provide rendering of all major vectorised font formats, that specific advantage of DjVu began to erode.
In the 2000s, with the growth of the world wide web and before widespread adoption of broadband, DjVu was often adopted by digital libraries as their format of choice, thanks to its integration with software like Greenstone and the Internet Archive, browser plugins which allowed advanced online browsing, smaller file size for comparable quality of book scans and other image-heavy documents and support for embedding and searching full text from OCR.
Some features such as the thumbnail previews were later integrated in the Internet Archive's BookReader and DjVu browsing was deprecated in its favour as around 2015 some major browsers stopped supporting Java applets and DjVu plugins with them.

Technical overview

File structure

The DjVu file format is based on the Interchange File Format and is composed of hierarchically organized chunks. The IFF structure is preceded by a 4-byte AT&T magic number. Following is a single FORM chunk with a secondary identifier of either DJVU or DJVM for a single-page or a multi-page document, respectively.

Chunk types

Compression

DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images ; the mask image is a high-resolution bilevel image and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44. The mask image is compressed using a method called JB2. The JB2 encoding method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once and then records every place on the page it occurs.
Optionally, these shapes may be mapped to UTF-8 codes and stored in the DjVu file. If this mapping exists, it is possible to select and copy text.
Since JBIG2 was based on JB2, both compression methods have the same problems when performing lossy compression. Numbers may be substituted with similarly looking numbers if the text was scanned at a low resolution prior to lossy compression.

Format licensing

DjVu is an open file format with patents. The file format specification is published, as well as source code for the reference library. The original authors distribute an open-source implementation named "DjVuLibre" under the GNU General Public License. The rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T Corporation, LizardTech, Celartem and Cuminas.
Celartem acquired LizardTech and Extensis.

Support

DjVu is not widely supported by scanning and viewing software. While viewers can be downloaded, opening DjVu files is not implemented in most operating systems by default.
In 2002, the DjVu file format was chosen by the Internet Archive as a format in which its Million Book Project provides scanned public-domain books online. In February 2016, the IA announced that DjVu would no longer be used for new uploads.
Wikimedia Commons, a media repository used by Wikipedia among others, conditionally permits PDF and DjVu media files.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...