HTTP compression

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.
HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include gzip and Deflate; however, a full list of available schemes is maintained by the IANA. Additionally, third parties develop new methods and include them in their products, such as the Google Shared Dictionary Compression for HTTP scheme implemented in the Google Chrome browser and used on Google servers.
There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of a HTTP message is compressed. At a higher level, a Content-Encoding header field may indicate that a resource being transferred, cached, or otherwise referenced is compressed. Compression using Content-Encoding is more widely supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs in servers.

Compression scheme negotiation

In most cases, excluding the SDCH, the negotiation is done in two steps, described in RFC 2616:
1. The web client advertises which compression schemes it supports by including a list of tokens in the HTTP request. For Content-Encoding, the list in a field called Accept-Encoding; for Transfer-Encoding, the field is called TE.

GET /encrypted-area HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip, deflate

2. If the server supports one or more compression schemes, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding or Transfer-Encoding field in the HTTP response with the used schemes, separated by commas.

HTTP/1.1 200 OK
Date: mon, 26 June 2016 22:38:34 GMT
Server: Apache/1.3.3.7
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip

The web server is by no means obligated to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.
In case of SDCH a dictionary negotiation is also required, which may involve additional steps, like downloading a proper dictionary from the external server.

Content-Encoding tokens

The official list of tokens available to servers and client is maintained by IANA, and it includes:

br – Brotli, a compression algorithm specifically designed for HTTP content encoding, defined in RFC 7932 and implemented in Mozilla Firefox release 44 and Chromium release 50
compress – UNIX "compress" program method
deflate – compression based on the deflate algorithm, a combination of the LZ77 algorithm and Huffman coding, wrapped inside the zlib data format ;
exi – W3C Efficient XML Interchange
gzip – GNU zip format. Uses the deflate algorithm for compression, but the data format and the checksum algorithm differ from the "deflate" content-encoding. This method is the most broadly supported as of March 2011.
identity – No transformation is used. This is the default value for content coding.
pack200-gzip – Network Transfer Format for Java Archives
zstd – Zstandard compression, defined in RFC 8478

In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients:

bzip2 – compression based on the free bzip2 format, supported by lighttpd
lzma – compression based on LZMA is available in Opera 20, and in elinks via a compile-time option
peerdist – Microsoft Peer Content Caching and Retrieval
rsync - delta encoding in HTTP, implemented by a pair of rproxy proxies.
sdch – Google Shared Dictionary Compression for HTTP, based on VCDIFF
xpress - Microsoft compression protocol used by Windows 8 and later for Windows Store application updates. LZ77-based compression optionally using a Huffman encoding.
xz - LZMA2-based content compression, supported by a non-official Firefox patch; and fully implemented in mget since 2013-12-31.
Servers that support HTTP compression
SAP NetWeaver
Microsoft IIS: built-in or using third-party module
Apache HTTP Server, via
Hiawatha HTTP server: serves pre-compressed files
Cherokee HTTP server, On the fly gzip and deflate compressions
Oracle iPlanet Web Server
Zeus Web Server
lighttpd, via mod_compress and the newer mod_deflate
nginx – built-in
Applications based on Tornado, if "compress_response" is set to True in the application settings
Jetty Server – built-into default static content serving and available via servlet filter configurations
GeoServer
Apache Tomcat
IBM Websphere
AOLserver
Ruby Rack, via the Rack::Deflater middleware
HAProxy
Varnish – built-in. Works also with ESI
– Serving pre-compressed files

The compression in HTTP can also be achieved by using the functionality of server-side scripting languages like PHP, or programming languages like Java.

Problems preventing the use of HTTP compression

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to increase in page load time when users do not receive compressed content. This occurs when anti-virus software interferes with connections to force them to be uncompressed, where proxies are used, where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 when behind a proxy – a common configuration in corporate environments – was the mainstream browser most prone to failing back to uncompressed HTTP.
Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate inside a zlib formatted stream, Microsoft server and client products historically implemented it as a "raw" deflated stream, making its deployment unreliable. For this reason, some software, including the Apache HTTP Server, only implement gzip encoding.

Security implications

Compression allows a form of chosen plaintext attack to be performed: if an attacker can inject any chosen content into the page, they can know whether the page contains their given content by observing the size increase of the encrypted stream. If the increase is smaller than expected for random injections, it means that the compressor has found a repeat in the text, i.e. the injected content overlaps the secret information. This is the idea behind CRIME.
In 2012, a general attack against the use of data compression, called CRIME, was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined.
In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds, provided the attacker tricks the victim into visiting a malicious web link. All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used. Unlike previous instances of CRIME, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users.
As of 2016, the TIME attack and the HEIST attack are now public knowledge.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

HTTP compression

Compression scheme negotiation

Content-Encoding tokens

Servers that support HTTP compression

Problems preventing the use of HTTP compression

Security implications