CRAM (file format)

CRAM is a compressed columnar file format for storing biological sequences aligned to a reference sequence, initially devised by Markus Hsi-Yang Fritz et al.
CRAM was designed to be an efficient reference-based alternative to the Sequence Alignment Map and Binary Alignment Map file formats. It optionally uses a genomic reference to describe differences between the aligned sequence fragments and the reference sequence, reducing storage costs. Additionally each column in the SAM format is separated into its own blocks, improving compression ratio. CRAM files typically vary from 30 to 60% smaller than BAM, depending on the data held within them.
Implementations of CRAM exist in htsjdk, htslib, JBrowse, and Scramble.
The file format specification is maintained by the Global Alliance for Genomics and Health with the specification document available from the EBI cram toolkit page.

File format

The basic structure of a CRAM file is a series of containers, the first of which holds a compressed copy of the SAM header. Subsequent containers consist of a container Compression Header followed by a series of slices which in turn hold the alignment records themselves, formatted as a series of blocks.
CRAM file:
Container:
Slice:
CRAM constructs records from a set of data series, describing the components of an alignment. The container Compression Header specifies which data series is encoded in which block, what codec will be used, and any codec specific meta-data. While data series can be mixed together within the same block, keeping them separate usually improves compression and provides the opportunity for efficient selective decoding where only some data types are required.
Selective access to a CRAM file is granted via the index. On chromosome and position sorted data this indicates which region is covered by each slice. On unsorted data the index may be used to simply fetch the N^th container. Selective decoding may also be achieved using the Compression Header to skip specified data series if partial records are required.

History

Year	Version	Notes
2010-11	pre-CRAM	Initial paper describing the reference based format. This did not use the name CRAM, but called it mzip. This software was implemented in Python as a prototype and demonstration of the basic concepts.
2011-12	0.3 - 0.86	Vadim Zalunin of the European Bioinformatics Institute produced the first implementation named CRAM as a package called CRAMtools, written in the Java programming language.
2012	1.0	Implemented in Java CRAMtools.
2013		C implementation added to the Scramble tool, by James Bonfield of the Wellcome Sanger Institute.
2013	2.0	Changes included support for more than one reference per slice, better encoding of SAM auxiliary tags, splitting soft-clip and inserted bases into their own data-series, meta-data to track the number of records and bases per slice, and corrections to the BF data-series.
2013		Added to htslib.
2014	2.1	Added EOF blocks, to help identify truncated files.
2014		Added to htsjdk.
2014	3.0	Inclusion of lzma and rANS codecs for block compression, along with multiple checksums for ensuring data integrity
2018		Javascript implementation as part of JBrowse, by Rob Buels.

CRAM version 4.0 exists as a prototype in Scramble, initially demonstrated in 2015, but has yet to be adopted as a standard.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...