Pileup format


Pileup format is a text-based format for summarizing the base calls of aligned reads to a reference sequence. This format facilitates visual display of SNP/indel calling and alignment. It was first used by
Tony Cox and Zemin Ning at the Wellcome Trust Sanger Institute, but became widely known through its implementation within the SAMtools software suite.

Format

Example

The columns

Each line consists of 5 tab-separated columns:
  1. Sequence identifier
  2. Position in sequence
  3. Reference nucleotide at that position
  4. Number of aligned reads covering that position
  5. Bases at that position from aligned reads
  6. Phred Quality of those bases, represented in ASCII with -33 offset

    Column 5: The bases string

This is an optional column. If present, the ASCII value of the character minus 33 gives the mapping Phred quality of each of the bases in the previous column 5. This is similar to quality encoding in the FASTQ format.

File extension

There is no standard file extension for a Pileup file, but.msf,.pup and.pileup are used.