Long interspersed nuclear element


Long interspersed nuclear elements are a group of non-LTR retrotransposons that are widespread in the genome of many eukaryotes. They make up around 21.1% of the human genome. LINEs make up a family of transposons, where each LINE is about 7,000 base pairs long. LINEs are transcribed into mRNA and translated into protein that acts as a reverse transcriptase. The reverse transcriptase makes a DNA copy of the LINE RNA that can be integrated into the genome at a new site.
The only abundant LINE in humans is LINE1. The human genome contains an estimated 100,000 truncated and 4,000 full-length LINE-1 elements. Due to the accumulation of random mutations, the sequence of many LINEs has degenerated to the extent that they are no longer transcribed or translated. Comparisons of LINE DNA sequences can be used to date transposon insertion in the genome.

History of discovery

The first description of an approximately 6.4 kb long LINE-derived sequence was published by J. Adams et al. in 1980.

Types

Based on structural features and the phylogeny of its key enzyme, the reverse transcriptase, LINEs are grouped into five main groups, called L1, RTE, R2, I and Jockey, which can be subdivided into at least 28 clades.
In plant genomes, so far only LINEs of the L1 and RTE clade have been reported. Whereas L1 elements diversify into several subclades, RTE-type LINEs are highly conserved, often constituting a single family.
In fungi, Tad, L1, CRE, Deceiver and Inkcap-like elements have been identified, with Tad-like elements appearing exclusively in fungal genomes.
All LINEs encode a least one protein, ORF2, which contains an RT and an endonuclease domain, either an N-terminal APE or a C-terminal RLE or rarely both. A ribonuclease H domain is occasionally present. Except for the evolutionary ancient R2 and RTE superfamilies, LINEs usually encode for another protein named ORF1, which may contain an Gag-knuckle, a L1-like RRM, and/or an esterase. LINE elements are relatively rare compared to LTR-retrotransposons in plants, fungi or insects, but are dominant in vertebrates and especially in mammals, where they represent around 20% of the genome.

L1 element

The LINE-1/L1-element is one of the elements that are still active in the human genome today. It is found in all mammals except megabats.

Other elements

Remnants of L2 and L3 elements are found in the human genome. It is estimated that L2 and L3 elements were active ~200-300 million years ago. Unlike L1 elements, L2 elements lack flanking target site duplications. The L2 elements are in the same group as the CR1 clade, Jockey.

Incidence

In human

In the first human genome draft the fraction of LINE elements of the human genome was given as 21% and their copy number as 850,000. Of these, L1, L2 and L3 elements made up 516,000, 315,000 and 37,000 copies, respectively. The non-autonomous SINE elements which depend on L1 elements for their proliferation make up 13% of the human genome and have a copy number of around 1.5 million. They probably originated from the RTE family of LINEs. Recent estimates show the typical human genome contains on average 100 L1 elements with potential for mobilization, however there is a fair amount of variation and some individuals may contain a larger number of active L1 elements, making these individuals more prone to L1-induced mutagenesis.
Increased L1 copy numbers have also been found in the brains of people with schizophrenia, indicating that LINE elements may play a role in some neuronal diseases.

Propagation

LINE elements propagate by a so-called target primed reverse transcription mechanism, which was first described for the R2 element from the silkworm Bombyx mori.
ORF2 proteins primarily associate in cis with their encoding mRNA, forming a ribonucleoprotein complex, likely composed of two ORF2s and an unknown number of ORF1 trimers. The complex is transported back into the nucleus, where the ORF2 endonuclease domain opens the DNA. Thus, a 3'OH group is freed for the reverse transcriptase to prime reverse transcription of the LINE RNA transcript. Following the reverse transcription the target strand is cleaved and the newly created cDNA is integrated
New insertions create short TSDs, and the majority of new inserts are severely 5’-truncated and often inverted. Because they lack their 5’UTR, most of new inserts are non functional.

Regulation of LINE activity

It has been shown that host cells regulate L1 retrotransposition activity, for example through epigenetic silencing.
For example, the RNA interference mechanism of small interfering RNAs derived from L1 sequences can cause suppression of L1 retrotransposition.
In plant genomes, epigenetic modification of LINEs can lead to expression changes of nearby genes and even to phenotypic changes: In the oil palm genome, methylation of a Karma-type LINE underlies the somaclonal, 'mantled' variant of this plant, responsible for drastic yield loss.
Human APOBEC3C mediated restriction of LINE-1 elements were reported and it is due to the interaction between A3C with the ORF1p that affects the reverse transcriptase activity.

Association with disease

A historic example of L1-conferred disease is Haemophilia A, which is caused by insertional mutagenesis. There are nearly 100 examples of known diseases caused by retroelement insertions, including some types of cancer and neurological disorders. Correlation between L1 mobilization and oncogenesis has been reported for epithelial cell cancer. Hypomethylation of LINES is associated with chromosomal instability and altered gene expression and is found in various cancer cell types in various tissues types. Hypomethylation of a specific L1 located in the MET onco gene is associated with bladder cancer tumorogenesis, Shift work sleep disorder is associated with increased cancer risk because light exposure at night reduces melatonin, a hormone that has been shown to reduce L1-induced genome instability.