In genetics, a super-enhancer is a region of the mammalian genome comprising multiple enhancers that is collectively bound by an array of transcription factor proteins to drive transcription of genes involved in cell identity. Because super-enhancers are frequently identified near genes important for controlling and defining cell identity, they may thus be used to quickly identify key nodes regulating cell identity. Enhancers have several quantifiable traits that have a range of values, and these traits are generally elevated at super-enhancers. Super-enhancers are bound by higher levels of transcription-regulating proteins and are associated with genes that are more highly expressed. Expression of genes associated with super-enhancers is particularly sensitive to perturbations, which may facilitate cell state transitions or explain sensitivity of super-enhancer—associated genes to small molecules that target transcription.
History
The regulation of transcription by enhancers has been studied since the 1980s. Large or multi-component transcription regulators with a range of mechanistic properties, including locus control regions, clustered open regulatory elements, and transcription initiation platforms, were observed shortly thereafter. More recent research has suggested that these different categories of regulatory elements may represent subtypes of super-enhancer. In 2013, two labs identified large enhancers near several genes especially important for establishing cell identities. While Richard A. Young and colleagues identified super-enhancers, Francis Collins and colleagues identified stretch enhancers. Both super-enhancers and stretch enhancers are clusters of enhancers that control cell-specific genes and may be largely synonymous. As currently defined, the term “super-enhancer” was introduced by Young’s lab to describe regions identified in mouse embryonic stem cells. These particularly large, potent enhancer regions were found to control the genes that establish the embryonic stem cell identity, including Oct-4, Sox2, Nanog, Klf4, and Esrrb. Perturbation of the super-enhancers associated with these genes showed a range of effects on their target genes’ expression. Super-enhancers have been since identified near cell identity-regulators in a range of mouse and human tissues.
Function
The enhancers comprising super-enhancers share the functions of enhancers, including binding transcription factor proteins, looping to target genes, and activating transcription. Three notable traits of enhancers comprising super-enhancers are their clustering in genomic proximity, their exceptional signal of transcription-regulating proteins, and their high frequency of physical interaction with each other. Perturbing the DNA of enhancers comprising super-enhancers showed a range of effects on the expression of cell identity genes, suggesting a complex relationship between the constituent enhancers. Super-enhancers separated by tens of megabases cluster in three-dimensions inside the nucleus of mouse embryonic stem cells. High levels of many transcription factors and co-factors are seen at super-enhancers. This high concentration of transcription-regulating proteins suggests why their target genes tend to be more highly expressed than other classes of genes. However, housekeeping genes tend to be more highly expressed than super-enhancer—associated genes. Super-enhancers may have evolved at key cell identity genes to render the transcription of these genes responsive to an array of external cues. The enhancers comprising a super-enhancer can each be responsive to different signals, which allows the transcription of a single gene to be regulated by multiple signaling pathways. Pathways seen to regulate their target genes using super-enhancers include Wnt, TGFb, LIF, BDNF, and NOTCH. The constituent enhancers of super-enhancers physically interact with each other and their target genes over a long range sequence-wise. Super-enhancers that control the expression of major cell surface receptors with a crucial role in the function of a given cell lineage have also been defined. This is notably the case for B-lymphocytes, the survival, the activation and the differentiation of which rely on the expression of membrane-form immunoglobulins. The Ig heavy chain locus super-enhancer is a very large cis-regulatory region, including multiple enhancers and controlling several major modifications of the locus.
Relevance to Disease
Mutations in super-enhancers have been noted in various diseases, including cancers, type 1 diabetes, Alzheimer’s disease, lupus, rheumatoid arthritis, multiple sclerosis, systemic scleroderma, primary biliary cirrhosis, Crohn’s disease, Graves disease, vitiligo, and atrial fibrillation. A similar enrichment in disease-associated sequence variation has also been observed for stretch enhancers. Super-enhancers may play important roles in the misregulation of gene expression in cancer. During tumor development, tumor cells acquire super-enhancers at key oncogenes, which drive higher levels of transcription of these genes than in healthy cells. Altered super-enhancer function is also induced by mutations of chromatin regulators. Acquired super-enhancers may thus be biomarkers that could be useful for diagnosis and therapeutic intervention. Proteins enriched at super-enhancers include the targets of small molecules that target transcription-regulating proteins and have been deployed against cancers. For instance, super-enhancers rely on exceptional amounts of CDK7, and, in cancer, multiple papers report the loss of expression of their target genes when cells are treated with the CDK7 inhibitor THZ1. Similarly, super-enhancers are enriched in the target of the JQ1small molecule, BRD4, so treatment with JQ1 causes exceptional losses in expression for super-enhancer—associated genes.
Identification
Super-enhancers have been most commonly identified by locating genomic regions that are highly enriched in ChIP-Seq signal. ChIP-Seq experiments targeting master transcription factors and co-factors like Mediator or BRD4 have been used, but the most frequently used is H3K27ac-marked nucleosomes. The program “ROSE” is commonly used to identify super-enhancers from ChIP-Seq data. This program stitches together previously identified enhancer regions and ranks these stitched enhancers by their ChIP-Seq signal. The stitching distance selected to combine multiple individual enhancers into larger domains can vary. Because some markers of enhancer activity also are enriched in promoters, regions within promoters of genes can be disregarded. ROSE separates super-enhancers from typical enhancers by their exceptional enrichment in a mark of enhancer activity. Homer is another tool that can identify super-enhancers.