Perturb-seq


Perturb-seq refers to a high-throughput method of performing single cell RNA sequencing on pooled genetic perturbation screens. Perturb-seq combines multiplexed CRISPR mediated gene inactivations with single cell RNA sequencing to assess comprehensive gene expression phenotypes for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of phenotypes at the level of the transcriptome, to elucidate gene functions in many cells, in a massively parallel fashion.
The Perturb-seq protocol uses CRISPR technology to inactivate specific genes and DNA barcoding of each guide RNA to allow for all perturbations to be pooled together and later deconvoluted, with assignment of each phenotype to a specific guide RNA. Droplet-based microfluidics platforms are used to isolate individual cells, and then scRNA-seq is performed to generate gene expression profiles for each cell. Upon completion of the protocol, bioinformatics analyses are conducted to associate each specific cell and perturbation with a transcriptomic profile that characterizes the consequences of inactivating each gene.
In the December 2016 issue of the Cell journal, two companion papers were published that each introduced and described this technique. A third paper describing a conceptually similar approach was also published in the same issue. In October 2016, the CROP-seq method for single-cell CRISPR screening was presented in a preprint on bioRxiv and later published in the Nature Methods journal. While each paper shared the core principles of combining CRISPR mediated perturbation with scRNA-seq, their experimental, technological and analytical approaches differed in several aspects, to explore distinct biological questions, demonstrating the broad utility of this methodology. For example, the CRISPR-seq paper demonstrated the feasibility of in vivo studies using this technology, and the CROP-seq protocol facilitates large screens by providing a vector that makes the guide RNA itself readable, which allows for single-step guide RNA cloning.

Experimental workflow

CRISPR Single Guide RNA Library design and selection

Pooled CRISPR libraries that enable gene inactivation can come in the form of either knockout or interference. Knockout libraries perturb genes through double stranded breaks that prompt the error prone non-homologous end joining repair pathway to introduce disruptive insertions or deletions. CRISPR interference on the other hand utilizes a catalytically inactive nuclease to physically block RNA polymerase, effectively preventing or halting transcription. Perturb-seq has been utilized with both the knockout and CRISPRi approaches in the Dixit et al. paper and the Adamson et al. paper, respectively.
Pooling all guide RNAs into a single screen relies on DNA barcodes that act as identifiers for each unique guide RNA. There are several commercially available pooled CRISPR libraries including the guide barcode library used in the study by Adamson et al. CRISPR libraries can also be custom made using tools for sgRNA design, many of which are listed on the CRISPR/cas9 tools Wikipedia page.

Lentiviral vectors

The sgRNA expression vector design will depend largely on the experiment performed but requires the following central components:
  1. Promoter
  2. Restriction sites
  3. Primer Binding Sites
  4. sgRNA
  5. Guide Barcode
  6. Reporter gene:
  7. * Fluorescent gene: vectors are often constructed to include a gene encoding a fluorescent protein, such that successfully transduced cells can be visually and quantitatively assessed by their expression.
  8. * Antibiotic resistance gene: similar to fluorescent markers, antibiotic resistance genes are often incorporated into vectors to allow for selection of successfully transduced cells.
  9. CRISPR-associated endonuclease: Cas9 or other CRISPR-associated endonucleases such as Cpf1 must be introduced to cells that do not endogenously express them. Due to the large size of these genes, a two-vector system can be used to express the endonuclease separately from the sgRNA expression vector.

    Transduction and selection

Cells are typically transduced with a Multiplicity of Infection of 0.4 to 0.6 lentiviral particles per cell to maximize the likelihood of obtaining the most amount of cells which contain a single guide RNA. If the effects of simultaneous perturbations are of interest, a higher MOI may be applied to increase the amount of transduced cells with more than one guide RNA. Selection for successfully transduced cells is then performed using a fluorescence assay or an antibiotic assay, depending on the reporter gene used in the expression vector.

Single-cell library preparation

After successfully transduced cells have been selected for, isolation of single cells is needed to conduct scRNA-seq. Perturb-seq and CROP-seq have been performed using droplet-based technology for single cell isolation, while the closely related CRISP-seq was performed with a microwell-based approach. Once cells have been isolated at the single cell level, reverse transcription, amplification and sequencing takes place to produce gene expression profiles for each cell. Many scRNA-seq approaches incorporate unique molecular identifiers and cell barcodes during the reverse transcription step to index individual RNA molecules and cells, respectively. These additional barcodes serve to help quantify RNA transcripts and to associate each of the sequences with their cell of origin.

Bioinformatics analysis

Read alignment and processing are performed to map quality reads to a reference genome. Deconvolution of cell barcodes, guide barcodes and UMIs enables the association of guide RNAs with the cells that contain them, thus allowing the gene expression profile of each cell to be affiliated with a particular perturbation. Further downstream analyses on the transcriptional profiles will depend entirely on the biological question of interest. T-distributed Stochastic Neighbor Embedding is a commonly used machine learning algorithm to visualize the high-dimensional data that results from scRNA-seq in a 2-dimensional scatterplot. The authors who first performed Perturb-seq developed an in-house computational framework called MIMOSCA that predicts the effects of each perturbation using a linear model and is available on an open software repository.

Advantages and limitations

Perturb-seq makes use of current technologies in molecular biology to integrate a multi-step workflow that couples high-throughput screening with complex phenotypic outputs. When compared to alternative methods used for gene knockdowns or knockouts, such as RNAi, zinc finger nucleases or transcription activator-like effector nucleases, the application of CRISPR-based perturbations enables more specificity, efficiency and ease of use. Another advantage of this protocol is that while most screening approaches can only assay for simple phenotypes, such as cellular viability, scRNA-seq allows for a much richer phenotypic readout, with quantitative measurements of gene expression in many cells simultaneously.
However, while a large and comprehensive amount of data can be a benefit, it can also present a major challenge. Single cell RNA expression readouts are known to produce ‘noisy’ data, with a significant number of false positives. Both the large size and noise that is associated with scRNA-seq will likely require new and powerful computational methods and bioinformatics pipelines to better make sense of the resulting data. Another challenge associated with this protocol is the creation of large scale CRISPR libraries. The preparation of these extensive libraries depends upon a comparative increase in the resources required to culture the massive numbers of cells that are needed to achieve a successful screen of many perturbations.
In parallel to these single-cell methods, other approaches have been developed to reconstruct genetic pathways using whole-organism RNA-sequencing. These methods use a single aggregate statistic, called the transcriptome-wide epistasis coefficient, to guide pathway reconstruction. In contrast with the statistical framework of the methods described above, this coefficient may be more robust to noise and is intuitively interpretable in terms of Batesonian epistasis. This approach was used to identify a new state in the life cycle of the nematode C. elegans.

Applications

Perturb-seq or other conceptually similar protocols can be used to address a broad scope of biological questions and the applications of this technology will likely grow over time. Three papers on this topic, published in the December 2016 issue of the Journal Cell, demonstrated the utility of this method by applying it to the investigation of several distinct biological functions. In the paper, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens”, the authors used Perturb-seq to conduct knockouts of transcription factors related to the immune response in hundreds of thousands of cells to investigate the cellular consequences of their inactivation. They also explored the effects of transcription factors on cell states in the context of the cell cycle. In the study led by UCSF, “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” the researchers suppressed multiple genes in each cell to study the unfolded protein response pathway. With a similar methodology, but using the term CRISP-seq instead of Perturb-seq, the paper "Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq" performed a proof of concept experiment by using the technique to probe regulatory pathways related to innate immunity in mice. Lethality of each perturbation and epistasis analyses in cells with multiple perturbations was also investigated in these papers. Perturb-seq has so far been used with very few perturbations per experiment, but it can theoretically be scaled up to address the whole genome. Finally, the October 2016 preprint and subsequent paper demonstrate the bioinformatic reconstruction of the T cell receptor signaling pathway in Jurkat cells based on CROP-seq data.
While these publications used these protocols for answering complex biological questions, this technology can also be used as a validation assay to ensure the specificity of any CRISPR based knockdown or knockout; the expression levels of the target genes as well as others can be measured with single cell resolution in parallel, to detect whether the perturbation was successful and to assess the experiment for off target effects. Furthermore, these protocols make it possible to perform perturbation screens in heterogeneous tissues, while obtaining cell type specific gene expression responses.