Histone code


The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code. Histones associate with DNA to form nucleosomes, which themselves bundle to form chromatin fibers, which in turn make up the more familiar chromosome. Histones are globular proteins with a flexible N-terminus that protrudes from the nucleosome. Many of the histone tail modifications correlate very well to chromatin structure and both histone modification state and chromatin structure correlate well to gene expression levels. The critical concept of the histone code hypothesis is that the histone modifications serve to recruit other proteins by specific recognition of the modified histone via protein domains specialized for such purposes, rather than through simply stabilizing or destabilizing the interaction between histone and the underlying DNA. These recruited proteins then act to alter chromatin structure actively or to promote transcription.
For details of gene expression regulation by histone modifications see table below.

The hypothesis

The hypothesis is that chromatin-DNA interactions are guided by combinations of histone modifications. While it is accepted that modifications to histone tails alter chromatin structure, a complete understanding of the precise mechanisms by which these alterations to histone tails influence DNA-histone interactions remains elusive. However, some specific examples have been worked out in detail. For example, phosphorylation of serine residues 10 and 28 on histone H3 is a marker for chromosomal condensation. Similarly, the combination of phosphorylation of serine residue 10 and acetylation of a lysine residue 14 on histone H3 is a tell-tale sign of active transcription.

Modifications

Well characterized modifications to histones include:
However, there are many more histone modifications, and sensitive mass spectrometry approaches have recently greatly expanded the catalog.
A very basic summary of the histone code for gene expression status is given below :

[Histone H2B]

Unlike this simplified model, any real histone code has the potential to be massively complex; each of the four standard histones can be simultaneously modified at multiple different sites with multiple different modifications. To give an idea of this complexity, histone H3 contains nineteen lysines known to be methylated—each can be un-, mono-, di- or tri-methylated. If modifications are independent, this allows a potential 419 or 280 billion different lysine methylation patterns, far more than the maximum number of histones in a human genome. And this does not include lysine acetylation, arginine methylation or threonine/serine/tyrosine phosphorylation, not to mention modifications of other histones.
Every nucleosome in a cell can therefore have a different set of modifications, raising the question of whether common patterns of histone modifications exist. A study of about 40 histone modifications across human gene promoters found over 4000 different combinations used, over 3000 occurring at only a single promoter. However, patterns were discovered including a set of 17 histone modifications that are present together at over 3000 genes. Therefore, patterns of histone modifications do occur but they are very intricate, and we currently have detailed biochemical understanding of the importance of a relatively small number of modifications.
Structural determinants of histone recognition by readers, writers and erasers of the histone code are revealed by a growing body of experimental data.