Single-cell RNA sequencing

2. Single-cell RNA sequencing#

This chapter provides a short introduction to the most widely used single-cell ribonucleic acid (RNA) sequencing assays and associated basic molecular biology concepts. Multimodal or spatial assays are not covered here, but are introduced in the respective advanced chapters. All sequencing assays have individual strengths and limitations which must be known by data analysts to be aware of possible biases in the data.

2.1. The building block of life#

Life, as we know it, is the characteristic that distinguishes living from dead or inanimate entities. Most definitions of the term life share a common entity - cells. Cells form open systems which maintain homeostasis, have a metabolism, grow, adapt to their environment, reproduce, respond to stimuli, and organize themselves. Therefore, cells are the fundamental building block of life which were first discovered in 1665 by the British scientist Robert Hooke. Hooke investigated a thin slice of cork with a very rudimentary microscope, and to his surprise noticed that the slice appeared to resemble a honeycomb. He named these tiny units ‘cells’.

Robert Hook cell — Fig. 2.1 Robert Hooke’s drawing of cork cells. Image obtained from Micrographia.#

In 1839, Matthias Jakob Schleiden and Theodor Schwann first described Cell Theory. It describes that all living organisms are made up of cells. Cells act as functional units that by themselves originate from other cells, making them the basic units of reproduction.

Since the early definition of cell theory, researchers discovered that there exists an energy flow within cells, that heredity information is passed from one cell to another in the form of DNA and that all cells have almost the same chemical composition. Two general types of cells exist, eukaryotes and prokaryotes. Eukaryotic cells contain a nucleus, where the nuclear membrane encapsulates the chromosomes; while prokaryotic cells only have a nucleoid region, but no nucleus. The nucleus hosts the cells’ genomic deoxyribonucleic acid DNA and is the reason for the eukaryotes’ name: Nucleus is Latin for kernel or seed. Eukaryotes are organisms composed of a single cell (unicellular) or multiple cells (multicellular), whereas prokaryotes are single-celled organisms. Eukaryotic cells are further distinguished from prokaryotic cells by their high degree of compartmentalization, i.e. membrane-bound organelles are carrying out highly specialized functions and providing crucial support for cells.

Compared to prokaryotic cells, eukaryotic cells have on average about 10,000 times the volume with a rich mix of organelles and a cytoskeleton constituted of microtubules, microfilaments, and intermediate filaments. The DNA replication machinery reads the hereditary information that is stored in the DNA in the nucleus to replicate themselves and keep the life cycle going. The eukaryotic DNA is divided into several linear bundles called chromosomes, which are separated by the microtubular spindle during nuclear division. Understanding the hereditary information hidden in DNA is key to understanding many evolutionary and disease-related processes. Sequencing is the process of deciphering the order of DNA nucleotides and is primarily used to unveil the genetic information that is carried by a specific DNA segment, a complete genome, or even a complex microbiome. DNA sequencing allows researchers to identify the location and function of genes and regulatory elements in the DNA molecule and the genome, and uncovers genetic features such as open reading frames (ORFs) or CpG islands, which indicate promotor regions. Another very common application area is evolutionary analysis, where homologous DNA sequences from different organisms are compared. DNA sequencing can additionally be applied for the associations between mutations and diseases or sometimes even disease resistance, deeming it one of the most useful applications.

A very popular example is sickle cell disease, a group of blood disorders, which results from an abnormality in the oxygen-carrying protein hemoglobin in red blood cells. This leads to serious health issues including pain, anemia, swelling in the hands and feet, bacterial infections and strokes. The cause of sickle cell disease is the inheritance of two abnormal copies of the β-globin gene (HBB) that makes hemoglobin, one from each parent. The gene defect is caused by a single nucleotide mutation where a GAG codon changes to a GTG codon of the β-globin gene. This results in the amino acid glutamate being substituted by valine at position 6 (E6V substitution) and henceforth the above-mentioned disease. It is unfortunately not always possible to find such “simple” associations between single nucleotide mutations and diseases, due to most diseases being caused by, for example, complex regulatory processes.

2.2. A brief history of sequencing#

2.2.1. First generation sequencing#

Although DNA was already first isolated in 1869 by Friedrich Miescher, it took the scientific community more than 100 years to develop high throughput sequencing technologies. In 1953, Watson, Crick and Franklin discovered the structure of DNA; and in 1965 Robert Holley sequenced the first tRNA. Seven years later, in 1972, Walter Fiers was the first to sequence a complete gene (the coat protein of bacteriophage MS2) using RNAses to digest the virus RNA, isolate oligonucleotides and finally separate them with electrophoresis and chromatography[JOU et al., 1972]. In parallel, Frederick Sanger developed a DNA sequencing method using radiolabeled, partially digested fragments termed “chain termination method”, which is more commonly known as “Sanger Sequencing”. Although Sanger Sequencing is still used even today, it suffered from several shortcomings, including lack of automation and being time-consuming. In 1987, Leroy Hood and Michael Hunkapiller developed the ABI 370, an instrument that automates the Sanger Sequencing process. Its most important innovative accomplishment was the automatic labeling of DNA fragments with fluorescent dyes instead of radioactive molecules. This change not only made the method safer to perform, but also allowed for computers to analyze the acquired data[Hood et al., 1987].

Strengths:

Sanger sequencing is simple and affordable.
If done correctly, the error rate is very low (<0.001%).

Limitations:

Sanger methods can only sequence short pieces of DNA of about 300 to 1000 base pairs (bp).
The quality of a Sanger sequence is often not very good in the first 15 to 40 bases, because this is where the primers bind.
Sequencing degrades after 700 to 900 bases.
If the sequenced DNA fragment has been cloned, some of the cloning vector sequence may find its way into the final sequence.
Sanger sequencing is more expensive than second or third generation sequencing per sequenced base.

2.2.2. Second generation sequencing#

Nine years later, in 1996, Mostafa Ronaghi, Mathias Uhlen, and Pȧl Nyŕen introduced a new DNA sequencing technique called pyrosequencing, introducing the age of second generation sequencing. Second generation sequencing, also known as next-generation sequencing (NGS), was primarily made possible by further automation in the lab, the usage of computers, and the miniaturization of reactions. Pyrosequencing measures luminescence that is generated by pyrophosphate synthesis during sequencing. This process is also commonly known as “sequencing-by-synthesis”. Two years later, Shankar Balasubramanian and David Klenerman, developed and adapted the sequencing-by-synthesis process for a new method which utilizes fluorescent dyes at the company Solexa. Solexa’s technology also forms the basis of Illumina’s sequencers, which dominate the market today. The Roche 454 sequencer developed in 2005, was the first sequencer to fully automate the pyrosequencing process in a single, automated machine. Many other platforms were introduced such as SOLiD systems’ “sequencing-by-ligation” (2007) and Life Technologies’ Ion Torrent (2011) that uses “sequencing-by-synthesis” to detect hydrogen ions when new DNA is synthesized.

Strengths:

Second generation sequencing is often the cheapest option with respect to required chemicals.
Sparse material can still be used as input.
High sensitivity to detect low-frequency variants and comprehensive genome coverage.
High capacity with sample multiplexing.
Ability to sequence thousands of genes simultaneously,

Limitations:

The sequencing machines are expensive and often need to be shared with colleagues.
Second generation sequencers are big, stationary machines and not designed for field work.
Generally, second generation sequencing results in many short sequencing fragments (reads) which are hard to use for novel genomes.
The quality of sequencing result is dependent on the reference genome

2.2.3. Third generation sequencing#

The third generation of sequencing, nowadays also known as next-generation sequencing, brought two innovations to the market. First, long-read sequencing, which describes the ability to obtain nucleotide fragments of longer lengths than the usual Illumina short-read sequencers generate (order of 75 to 300 base pairs depending on the sequencer). This is especially important for the assembly of novel genomes without an available reference genome. Second, the ability to sequence in real time is another major advancement in third generation sequencing. Combined with portable sequencers, which are small in size and do not require further complex machines for the chemistry, sequencing is now “field-ready” and can be used even far away from laboratory facilities to collect samples.

Pacific Biosciences (PacBio) introduced zero-mode waveguide (ZMW) sequencing in 2010, which uses so-called nanoholes containing a single DNA polymerase. This allows incorporation of any single nucleotide to be directly observed by detectors attached below the nanoholes. Each type of nucleotides is labeled with a specific fluorescent dye that emits fluorescent signals during the incorporation process, which are subsequently measured as sequence readout. Reads obtained from PacBio sequencers are usually of 8 to 15 kilobases (kb), with possibilities for up to 70kb.

Oxford Nanopore Technologies introduced the GridION in 2012. The GridION and its successors MinION and Flongle are portable sequencers for DNA and RNA sequencing which produce reads of more than 2 Mb. Notably, such a sequencing device even fits into a single human hand. Oxford Nanopore sequencers observe changes in the electrical current that occur when nucleic acids pass through protein nanopores, to identify the nucleotide sequence[Jain et al., 2016].

Strengths:

Long reads will allow for the assembly of large novel genomes.
Sequencers are portable, allowing for field work.
Possibility to directly detect epigenetic modifications of DNA and RNA sequences.
Speed. Third generation sequencers are fast.

Limitations:

Some third generation sequencers exhibit higher error rates than second generation sequencers.
The reagents are generally more expensive than second generation sequencing.

2.3. Overview of the NGS process#

Even though a variety of NGS technologies exist, the general steps to sequence DNA (and therefore reverse transcribed RNA) are largely the same. The differences lie primarily in the chemistry of the respective sequencing technologies.

Sample and library preparation: As a first step, a so-called library is prepared by fragmenting the DNA samples and ligating them with adapter molecules. They act in the hybridisation of the library fragments to the matrix and provide a priming site.
Amplification and sequencing: In the second step, the library gets converted into single strand molecules. During an amplification step (such as a polymerase chain reaction), clusters of DNA molecules are being created. All of the clusters perform individual reactions during a single sequencing run.
Data output and analysis: The output of a sequencing experiment depends on the sequencing technology and chemistry. Some sequencers generate fluorescence signals which are stored in specific output files, and others may generate electric signals which are stored in corresponding file formats. Generally, the amount of generated data, the raw data, is very large. Such data requires complex and computationally heavy processing. This is further discussed in the raw data processing chapter.

2.4. RNA sequencing#

So far, we have only introduced sequencing with the unmentioned assumption that the DNA is being sequenced. However, knowing the DNA sequence of an organism and the positions of its regulatory elements tells us very little about the dynamic and real-time operations of a cell. For example, by combining different mRNA splicing sites and exons from the same mRNA precursor, one gene can code for multiple proteins. This alternative splicing event is naturally occurring and commonly seen in eukaryotes; however, a variant could potentially result in a non-functional enzyme and an induced disease state. This is where RNA sequencing (RNA-Seq) comes into play.

RNA-Seq largely follows the DNA sequencing protocols, but includes a reverse transcription step where complementary DNA (cDNA) is synthesized from the RNA template.

Sequencing RNA allows scientists to obtain snapshots of cells, tissues or organisms at the time of sequencing in the form of expression profiles of genes. This information can be used to detect changes in disease states in response to therapeutics, under different environmental conditions, when comparing genotypes and other experimental designs.

Modern RNA sequencing allows for an unbiased sampling of transcripts in contrast to, for example, microarray based assays or RT-qPCR, which require probe design to specifically target the regions of interest. The obtained gene expression profiles further enable the detection of gene isoforms, gene fusions, single nucleotide variants, and many other interesting properties.

Modern RNA sequencing is not limited by prior knowledge and allows for the capture of both known and novel features, resulting in rich data sets that can be used for exploratory data analysis.

2.5. Single-cell RNA sequencing#

2.5.1. Overview#

Sequencing of RNA can be mainly conducted in two ways: Either by sequencing the mixed RNA from the source of interest across cells (bulk sequencing) or by sequencing the transcriptomes of the cells individually (single-cell sequencing). Mixing the RNA of all cells is in most cases cheaper and easier than experimentally complex single-cell sequencing. Bulk RNA-Seq results in cell-averaged expression profiles, which are generally easier to analyze, but also hide some of the complexity such as cell expression profile heterogeneity, which may help answer the question of interest. Some drugs or perturbations may affect only specific cell types or interactions between cell types. For example, in oncology, it is possible to have rare drug resistant tumor cells causing relapse, which is difficult to identify by simple bulk RNA-seq even on cultured cells.

To uncover such relationships, it is vital to examine gene expression on a single-cell level. Single-cell RNA-Seq (scRNA-Seq) does, however, come with several caveats. First, single-cell experiments are generally more expensive and more difficult to properly conduct. Second, the downstream analysis becomes more complex due to the increased resolution, and it is easier to draw false conclusions.

A single-cell experiment generally follows similar steps as a bulk RNA-Seq experiment (see above), but requires several adaptations. Just like bulk sequencing, single-cell sequencing requires lysis, reverse transcription, amplification, and the eventual sequencing. In addition, single-cell sequencing requires cell isolation and a physical separation into smaller reaction chambers or another form of cell labeling to be able to map the obtained transcriptomes back to the cells of origin later on. Hence, these are also the steps where most single-cell assays differ: single-cell isolation, transcript amplification, and, depending on the sequencing machine, sequencing. Before explaining how the different approaches to sequencing work, we will now discuss transcript quantification more closely.

2.5.2. Transcript quantification#

Transcript quantification is the process of converting the raw data into an table of estimated transcript counts per gene per sample (for bulk-sequencing) or per cell (for single-cell sequencing). More details on this computational process will be described in the next chapter.

There are two major approaches to transcript quantification: full-length and tag-based. Full-length protocols try to cover the whole transcript uniformly with sequencing reads, whereas tag-based protocols only capture the 5’ or 3’ ends. The transcript quantification method has strong implications on the captured genes, and analysts must therefore be aware of the used quantification process. Full-length sequencing is restricted to plate-based protocols (see below) and the library preparation is comparable to bulk RNA-seq sequencing approaches. An even coverage of transcripts is not always achieved with full-length protocols and therefore specific regions across the gene body may still be biased. A major advantage of full-length protocols is that they allow for the detection of splice variants.

Tag-based protocols only sequence the 3’ or 5’ ends of the transcripts. This comes at the cost of not (necessarily) covering the full gene length, making it difficult to unambiguously align reads to a transcript and distinguishing between different isoforms[Archer et al., 2016]. However, it allows for the usage of unique molecular identifiers (UMIs), which are useful to resolve biases in the transcript amplification process.

The transcript amplification process is a critical step in any RNA-seq sequencing run, to ensure that the transcripts are abundant enough for quality control and sequencing. During this process, which is typically conducted with polymerase chain reaction (PCR), copies are made from identical fragments of the original molecule. Since the copies and the original molecules are indistinguishable, determining the original number of molecules in samples becomes challenging. The usage of UMIs is a common solution to quantify the original, non-duplicated molecules.

UMIs serve as molecular barcodes and are also sometimes referred to as random barcodes. These ‘barcodes’ consist of short random nucleotide sequences that are added to every molecule in the sample as a unique tag. UMIs must be added during library generation before the amplification step. The ability to accurately identify PCR duplicates is important for downstream analysis to rule out - or be aware of amplification biases[Aird et al., 2011].

Amplification bias is a term for the RNA/cDNA sequences which are preferentially amplified and will therefore be sequenced more often, resulting in higher counts. It can have a detrimental effect on any gene expression analysis, because the not-very-active genes may suddenly appear to be highly expressed. This is especially true for sequences which are amplified at a later stage of the PCR step, where the error rate may already be comparably higher than earlier PCR stages. Although it is computationally possible to detect and remove such sequences by removing reads with identical alignment coordinates, it is generally advised to always design the experiment with UMIs, if possible. The usage of UMIs further allows for normalization of gene counts without a loss of accuracy[Kivioja et al., 2012].

2.5.3. Single-cell sequencing protocols#

Currently, three types of single-cell sequencing protocols exist, which are grouped primarily by their cell isolation protocols:

microfluidic device-based strategies where cells are encapsulated into hydrogel droplets
well plate based protocols where cells are physically separated into wells, and
the commercial Fluidigm C1 microfluidic chip based solution which loads and separates cells into small reaction chambers.

These three approaches differ in their ability to recover transcripts, the number of sequenced cells, and many other aspects. In the following subsections, we will briefly discuss how they work, their strengths and weaknesses, and possible biases that data analysts should be aware of regarding the respective protocols.

2.5.3.1. Microfluidic device based protocols#

Microfluidic device based single-cell strategies trap cells inside hydrogel droplets allowing for compartmentalisation into single-cell reaction chambers. The most widely used protocols inDrop[Klein et al., 2015], Drop-seq[Macosko et al., 2015] and the commercially available 10x Genomics Chromium[Zheng et al., 2017] are able to generate such droplets several thousand times per second. This massively parallel process generates very high numbers of droplets for a relatively low cost.

Although all three protocols differ in details, nanoliter-sized droplets containing encapsulated cells are always designed to capture beads and cells simultaneously. The encapsulation process is conducted with specialized microbeads with on-bead primers containing a PCR handle, a cell barcode and a 4-8b bp-long unique molecular identifier (UMI - see below) and a poly-T tail (or in the case of a 5’ kit, there will be a poly-T primer.). Upon lysis the cell’s mRNA is instantaneously released and captured by the barcoded oligonucleotides that are attached on the beads. Next, the droplets are collected and broken to release single-cell transcriptomes attached to microparticles (STAMPs). This is followed by PCR and reverse transcription to capture and amplify the transcripts. Finally, tagmentation takes place where the transcripts are randomly cut and sequencing adaptors get attached. This process results in sequencing libraries that are ready for sequencing as described above. In microfluidic based protocols only about 10% of the transcripts of the cell are recovered[Islam et al., 2014]. Notably, this low sequencing is sufficient for robust identification of cell types.

All three microfluidic device-based methods result in characteristic biases. The material of the used beads differs between the protocols. Drop-seq uses brittle resin for the beads and therefore the beads are encapsulated with a Poisson distribution, whereas the InDrop and 10X Genomics beads are deformable resulting in bead occupancies of over 80%[Zhang et al., 2019].

Moreover, capture efficiency is likely influenced by the use of surface-tethered primers in Drop-Seq. InDrop uses primers which are released with photocleavage and 10X genomics dissolves the beads. This disparity also affects the location of the reverse transcription process. In Drop-seq, reverse transcription occurs after the beads are released from the droplets, while reverse transcription takes place inside the droplets for the InDrop and 10X genomics protocols[Zhang et al., 2019].

A comparison from Zhang et al. in 2019 uncovered that inDrop and Drop-seq are outperformed by 10X Genomics with respect to bead quality, as the cell barcodes in the former two systems contained obvious mismatches. Moreover, the proportion of reads originating from valid barcodes was 75% for 10X Genomics, compared to only 25% for InDrop and 30% for Drop-seq.

Similar advantages were demonstrated for 10X Genomics regarding sensitivity. During their comparison, 10X Genomics captured about 17000 transcripts from 3000 genes on average, compared to 8000 transcripts from 2500 genes for Drop-seq and 2700 transcripts from 1250 genes for InDrop. Technical noise was the lowest for 10X Genomics, followed by Drop-seq and InDrop[Zhang et al., 2019].

The actual generated data demonstrated large protocol biases. 10X Genomics favored the capture and amplification of shorter genes and genes with higher GC content, while Drop-seq in comparison preferred genes with lower GC content. Although 10X Genomics was shown to outperform the other protocols in various aspects, it is also about twice as expensive per cell. Moreover, except the beads, Drop-seq is open-source and the protocol can more easily be adapted if required. InDrop is completely open-source, where even the beads can be manufactured and modified in labs. Hence, InDrop is the most flexible of the three protocols.

Strengths:

Allows for the cost-efficient sequencing of cells in large quantities, to identify the overall composition of a tissue and characterize rare cell types.
UMIs can be incorporated.

Limitations:

Low detection rates of transcripts compared to other methods.
Captures only 3’ ends (or 5’ ends, depending on kit) and not full transcripts.

2.5.3.2. Plate based#

Plate based protocols typically separate the cells physically into microwell plates. The first step entails cell sorting by, for example, fluorescent-activated cell sorting (FACS), where cells are sorted according to specific cell surface markers; or by micro pipetting. The selected cells are then placed into individual wells containing cell lysis buffers, where subsequently reverse transcription is carried out. This allows for several hundreds of cells to be analyzed in a single experiment with 5000 to 10000 captured genes each.

Plate based sequencing protocols include, but are not limited to, SMART-seq2, MARS-seq, QUARTZ-seq and SRCB-seq. Generally speaking, the protocols differ in their multiplexing ability. For example, MARS-seq allows for three barcode levels, namely molecular, cellular and plate-level tags, for robust multiplexing capabilities. SMART-seq2 on the contrary, does not allow for early multiplexing limiting cell numbers. A systematic comparison of protocols by Mereu et al in 2020 revealed that QUARTZ-seq2 is able to capture more genes than SMART-seq2, MARS-seq or SRCB-seq per cell[Mereu et al., 2020], which means QUARTZ-seq2 is able to capture cell-type specific marker genes well, allowing for confident cell type annotation.

Strengths:

Recovers many genes per cell, allowing for a deep characterization.
Possible to gather information before the library preparation e.g. through FACS sorting to associate information such as cell size and the intensity of any used labels with well coordinates.
Allows for full-length transcript recovery.

Limitations:

The scale of plate-based experiments is limited by the lower throughput of their individual processing units.
Fragmentation step eliminates strand-specific information [Hrdlickova et al., 2017].
Depending on the protocol, plate based protocols might be labor-intensive with many required pipetting steps, leading to potential technical noise and batch effects.

2.5.3.3. Fluidigm C1#

The commercial Fluidigm C1 system is a microfluidic chip, which loads and separates cells into small reaction chambers in an automated manner. The CEL-seq2 and SMART-seq (version 1) protocols are using the Fluidigm C1 chips in their workflow, allowing the RNA extraction and library preparation steps to be conducted together, thereby decreasing the required manual labor. However, the Fluidigm C1 requires rather homogeneous cell mixtures, since the cells will reach different locations on the microfluidic chip based on their size, which could introduce potential location bias. Since the amplification step is carried out in individual wells, full-length sequencing is possible, effectively reducing the 3’ bias of many other single-cell RNA-seq sequencing protocols. The protocol is generally also more expensive and is therefore primarily useful for an extensive examination of a specific cell population.

Strengths:

Allows for full-length transcript coverage.
Splicing variants and T/B cell receptor repertoire diversity can be recovered.

Limitations:

Only allows for the sequencing of up to 800 cells[Fluidigm, 2022].
More expensive per cell than other protocols.
Only about 10% of the extracted cells are captured, which makes this protocol unsuitable for rare cell types or low input.
The used arrays only capture specific cell sizes, which may bias the captured transcripts.

2.5.3.4. Nanopore single-cell transcriptome sequencing#

Long-read single-cell sequencing approaches rarely use UMI [Singh et al., 2019] or do not perform UMI correction [Gupta et al., 2018] and therefore misassign some reads to novel UMIs. Due to the higher sequencing error rate of long-read sequencers this causes serious issues [Lebrigand et al., 2020]. Lebrigand et al. introduced ScNaUmi-seq (Single-cell Nanopore sequencing with UMIs) which combines Nanopore sequencing with cell barcode and UMI assignment. The barcode assignment is guided with Illumina data by comparing the cell bar code sequences found in the Nanopore reads with those recovered from the Illumina reads for the same region or gene [Lebrigand et al., 2020]. However, this effectively requires two single-cell libraries. scCOLOR-seq computationally identifies barcodes without errors using nucleotide pair complementary across the full length of the barcode. These barcodes are then used as guides to correct the remaining erroneous barcodes [Philpott et al., 2021]. A modified UMI-tools directional network based method corrects for UMI sequence duplication.

Strengths:

Recovers splicing and sequence heterogeneity information

Weaknesses:

Nanopore reagents are expensive.
High cell barcode recovery error rates.
Depending on the protocol, barcode assignment is guided with Illumina data requiring two sequencing assays.

2.5.3.5. Summary#

In summary, we strongly recommend that wet lab and dry lab scientists select the sequencing protocol based on the aim of the study. Is a deep characterization of a specific cell type population desired? In this case one of the plate-based methods may be more suitable. On the contrary, droplet based assays will capture heterogeneous mixtures better, allowing for a more broad characterization of the sequenced cells. Moreover, if the budget is a limiting factor, the protocol of choice should be more cost-effective and robust. When analyzing the data, be aware of the sequencing assay specific biases. For an extensive comparison of all single-cell sequencing protocols, we recommend the “Benchmarking single-cell RNA-sequencing protocols for cell atlas projects” paper by Mereu et al[Mereu et al., 2020].

2.5.4. single-cell vs single-nuclei#

So far we have only been discussing single-cell assays, but it is also possible to only sequence the nuclei of the cells. Single-cell profiling does not always provide an unbiased view on cell types for specific tissues or organs, such as, for example, the brain. During the tissue dissociation process, some cell types are more vulnerable and therefore difficult to capture. For example, fast-spiking parvalbumin-positive interneurons and subcortically projecting glutamatergic neurons were observed in lower proportions than expected in mouse neocortex[Tasic et al., 2018]. On the contrary, non-neuronal cells survive dissociation better than neurons and are overrepresented in single-cell suspensions in the adult human neocortex[Darmanis et al., 2015]. Moreover, single-cell sequencing highly relies on fresh tissue, making it difficult to make use of tissue biobanks.

On the other hand, the nuclei are more resistant to mechanical force, and can be safely isolated from frozen tissue without the use of tissue dissociation enzymes[Krishnaswami et al., 2016]. Both options have varying applicability across tissues and sample types, and the resulting biases and uncertainties are still not fully uncovered. It has been shown already that nuclei accurately reflect all transcriptional patterns of cells[Ding et al., 2020]. The choice of single-cell versus single-nuclei in the experimental design is mostly driven by the type of tissue sample. Data analysis however should be aware of the fact that dissociation ability will have a strong effect on the potentially observable cell types. Therefore, we strongly encourage discussions between wet lab and dry lab scientists concerning the experimental design.

2.6. Recommended reading#

To get a more elaborate understanding of the experimental assays we recommend the following papers:

Comparative Analysis of Single-Cell RNA Sequencing Methods[Ziegenhain et al., 2017]
Power analysis of single-cell RNA-sequencing experiments[Svensson et al., 2017]
Single-nucleus and single-cell transcriptomes compared in matched cortical cell types[Bakken et al., 2018]
Guidelines for the experimental design of single-cell RNA sequencing studies[Lafzi et al., 2018]
Benchmarking single-cell RNA-sequencing protocols for cell atlas projects[Mereu et al., 2020]
Direct Comparative Analyses of 10X Genomics Chromium and Smart-seq2[Wang et al., 2021]

2.7. References#

[expARC+11]

Daniel Aird, Michael G. Ross, Wei-Sheng Chen, Maxwell Danielsson, Timothy Fennell, Carsten Russ, David B. Jaffe, Chad Nusbaum, and Andreas Gnirke. Analyzing and minimizing pcr amplification bias in illumina sequencing libraries. Genome Biology, 12(2):R18, Feb 2011. URL: https://doi.org/10.1186/gb-2011-12-2-r18, doi:10.1186/gb-2011-12-2-r18.

[expAWSH16]

Nathan Archer, Mark D. Walsh, Vahid Shahrezaei, and Daniel Hebenstreit. Modeling enzyme processivity reveals that rna-seq libraries are biased in characteristic and correctable ways. Cell Systems, 3(5):467–479.e12, 2016. URL: https://www.sciencedirect.com/science/article/pii/S2405471216303313, doi:https://doi.org/10.1016/j.cels.2016.10.012.

[expBHM+18]

Trygve E. Bakken, Rebecca D. Hodge, Jeremy A. Miller, Zizhen Yao, Thuc Nghi Nguyen, Brian Aevermann, Eliza Barkan, Darren Bertagnolli, Tamara Casper, Nick Dee, Emma Garren, Jeff Goldy, Lucas T. Graybuck, Matthew Kroll, Roger S. Lasken, Kanan Lathia, Sheana Parry, Christine Rimorin, Richard H. Scheuermann, Nicholas J. Schork, Soraya I. Shehata, Michael Tieu, John W. Phillips, Amy Bernard, Kimberly A. Smith, Hongkui Zeng, Ed S. Lein, and Bosiljka Tasic. Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLOS ONE, 13(12):1–24, 12 2018. URL: https://doi.org/10.1371/journal.pone.0209648, doi:10.1371/journal.pone.0209648.

[expDSZ+15]

Spyros Darmanis, Steven A. Sloan, Ye Zhang, Martin Enge, Christine Caneda, Lawrence M. Shuer, Melanie G. Hayden Gephart, Ben A. Barres, and Stephen R. Quake. A survey of human brain transcriptome diversity at the single cell level. Proceedings of the National Academy of Sciences, 112(23):7285–7290, 2015. URL: https://www.pnas.org/doi/abs/10.1073/pnas.1507125112, arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.1507125112, doi:10.1073/pnas.1507125112.

[expDAS+20]

Jiarui Ding, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, Marc H. Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex K. Shalek, Alexandra-Chloé Villani, Aviv Regev, and Joshua Z. Levin. Systematic comparison of single-cell and single-nucleus term`rna`-sequencing methods. Nature Biotechnology, 38(6):737–746, Jun 2020. URL: https://doi.org/10.1038/s41587-020-0465-8, doi:10.1038/s41587-020-0465-8.

[expFlu22]

Fluidigm. Single-cell analysis with microfluidics. https://www.fluidigm.com/area-of-interest/single-cell-analysis/single-cell-analysis-with-microfluidics, 2022. Accessed: 2022-05-07.

[expGCH+18]

Ishaan Gupta, Paul G. Collier, Bettina Haase, Ahmed Mahfouz, Anoushka Joglekar, Taylor Floyd, Frank Koopmans, Ben Barres, August B. Smit, Steven A. Sloan, Wenjie Luo, Olivier Fedrigo, M. Elizabeth Ross, and Hagen U. Tilgner. Single-cell isoform term`rna` sequencing characterizes isoforms in thousands of cerebellar cells. Nature Biotechnology, 36(12):1197–1202, Dec 2018. URL: https://doi.org/10.1038/nbt.4259, doi:10.1038/nbt.4259.

[expHHS87]

L E Hood, M W Hunkapiller, and L M Smith. Automated term`DNA` sequencing and analysis of the human genome. Genomics, 1(3):201–212, November 1987.

[expHTT17]

Radmila Hrdlickova, Masoud Toloue, and Bin Tian. Rna-seq methods for transcriptome analysis. WIREs RNA, 8(1):e1364, 2017. URL: https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wrna.1364, arXiv:https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wrna.1364, doi:https://doi.org/10.1002/wrna.1364.

[expIZJ+14]

Saiful Islam, Amit Zeisel, Simon Joost, Gioele La Manno, Pawel Zajac, Maria Kasper, Peter Lönnerberg, and Sten Linnarsson. Quantitative single-cell term`rna`-seq with unique molecular identifiers. Nature Methods, 11(2):163–166, Feb 2014. URL: https://doi.org/10.1038/nmeth.2772, doi:10.1038/nmeth.2772.

[expJOPA16]

Miten Jain, Hugh E. Olsen, Benedict Paten, and Mark Akeson. The oxford nanopore minion: delivery of nanopore sequencing to the genomics community. Genome Biology, 17(1):239, Nov 2016. URL: https://doi.org/10.1186/s13059-016-1103-0, doi:10.1186/s13059-016-1103-0.

[expJHYF72]

W. MIN JOU, G. HAEGEMAN, M. YSEBAERT, and W. FIERS. Nucleotide sequence of the gene coding for the bacteriophage ms2 coat protein. Nature, 237(5350):82–88, May 1972. URL: https://doi.org/10.1038/237082a0, doi:10.1038/237082a0.

[expKVaharautioK+12]

Teemu Kivioja, Anna Vähärautio, Kasper Karlsson, Martin Bonke, Martin Enge, Sten Linnarsson, and Jussi Taipale. Counting absolute numbers of molecules using unique molecular identifiers. Nature Methods, 9(1):72–74, Jan 2012. URL: https://doi.org/10.1038/nmeth.1778, doi:10.1038/nmeth.1778.

[expKMA+15]

Allon M. Klein, Linas Mazutis, Ilke Akartuna, Naren Tallapragada, Adrian Veres, Victor Li, Leonid Peshkin, David A. Weitz, and Marc W. Kirschner. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187–1201, May 2015. PMC4441768[pmcid]. URL: https://doi.org/10.1016/j.cell.2015.04.044, doi:10.1016/j.cell.2015.04.044.

[expKGN+16]

Suguna Rani Krishnaswami, Rashel V. Grindberg, Mark Novotny, Pratap Venepally, Benjamin Lacar, Kunal Bhutani, Sara B. Linker, Son Pham, Jennifer A. Erwin, Jeremy A. Miller, Rebecca Hodge, James K. McCarthy, Martijn Kelder, Jamison McCorrison, Brian D. Aevermann, Francisco Diez Fuertes, Richard H. Scheuermann, Jun Lee, Ed S. Lein, Nicholas Schork, Michael J. McConnell, Fred H. Gage, and Roger S. Lasken. Using single nuclei for term`rna`-seq to capture the transcriptome of postmortem neurons. Nature Protocols, 11(3):499–524, Mar 2016. URL: https://doi.org/10.1038/nprot.2016.015, doi:10.1038/nprot.2016.015.

[expLMPH18]

Atefeh Lafzi, Catia Moutinho, Simone Picelli, and Holger Heyn. Tutorial: guidelines for the experimental design of single-cell term`rna` sequencing studies. Nature Protocols, 13(12):2742–2757, Dec 2018. URL: https://doi.org/10.1038/s41596-018-0073-y, doi:10.1038/s41596-018-0073-y.

[expLMBW20] (1,2)

Kevin Lebrigand, Virginie Magnone, Pascal Barbry, and Rainer Waldmann. High throughput error corrected nanopore single cell transcriptome sequencing. Nature Communications, 11(1):4025, Aug 2020. URL: https://doi.org/10.1038/s41467-020-17800-6, doi:10.1038/s41467-020-17800-6.

[expMBS+15]

Evan Z. Macosko, Anindita Basu, Rahul Satija, James Nemesh, Karthik Shekhar, Melissa Goldman, Itay Tirosh, Allison R. Bialas, Nolan Kamitaki, Emily M. Martersteck, John J. Trombetta, David A. Weitz, Joshua R. Sanes, Alex K. Shalek, Aviv Regev, and Steven A. McCarroll. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161(5):1202–1214, May 2015. URL: https://doi.org/10.1016/j.cell.2015.05.002, doi:10.1016/j.cell.2015.05.002.

[expMLM+20] (1,2,3)

Elisabetta Mereu, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. McCarthy, Adrián Álvarez-Varela, Eduard Batlle, Sagar, Dominic Grün, Julia K. Lau, Stéphane C. Boutet, Chad Sanada, Aik Ooi, Robert C. Jones, Kelly Kaihara, Chris Brampton, Yasha Talaga, Yohei Sasagawa, Kaori Tanaka, Tetsutaro Hayashi, Caroline Braeuning, Cornelius Fischer, Sascha Sauer, Timo Trefzer, Christian Conrad, Xian Adiconis, Lan T. Nguyen, Aviv Regev, Joshua Z. Levin, Swati Parekh, Aleksandar Janjic, Lucas E. Wange, Johannes W. Bagnoli, Wolfgang Enard, Marta Gut, Rickard Sandberg, Itoshi Nikaido, Ivo Gut, Oliver Stegle, and Holger Heyn. Benchmarking single-cell term`rna`-sequencing protocols for cell atlas projects. Nature Biotechnology, 38(6):747–755, Jun 2020. URL: https://doi.org/10.1038/s41587-020-0469-4, doi:10.1038/s41587-020-0469-4.

[expPWT+21]

Martin Philpott, Jonathan Watson, Anjan Thakurta, Tom Brown, Udo Oppermann, and Adam P. Cribbs. Nanopore sequencing of single-cell transcriptomes with sccolor-seq. Nature Biotechnology, 39(12):1517–1520, Dec 2021. URL: https://doi.org/10.1038/s41587-021-00965-w, doi:10.1038/s41587-021-00965-w.

[expSAEC+19]

Mandeep Singh, Ghamdan Al-Eryani, Shaun Carswell, James M. Ferguson, James Blackburn, Kirston Barton, Daniel Roden, Fabio Luciani, Tri Giang Phan, Simon Junankar, Katherine Jackson, Christopher C. Goodnow, Martin A. Smith, and Alexander Swarbrick. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nature Communications, 10(1):3120, Jul 2019. URL: https://doi.org/10.1038/s41467-019-11049-4, doi:10.1038/s41467-019-11049-4.

[expSNL+17]

Valentine Svensson, Kedar Nath Natarajan, Lam-Ha Ly, Ricardo J. Miragaia, Charlotte Labalette, Iain C. Macaulay, Ana Cvejic, and Sarah A. Teichmann. Power analysis of single-cell term`rna`-sequencing experiments. Nature Methods, 14(4):381–387, Apr 2017. URL: https://doi.org/10.1038/nmeth.4220, doi:10.1038/nmeth.4220.

[expTYG+18]

Bosiljka Tasic, Zizhen Yao, Lucas T. Graybuck, Kimberly A. Smith, Thuc Nghi Nguyen, Darren Bertagnolli, Jeff Goldy, Emma Garren, Michael N. Economo, Sarada Viswanathan, Osnat Penn, Trygve Bakken, Vilas Menon, Jeremy Miller, Olivia Fong, Karla E. Hirokawa, Kanan Lathia, Christine Rimorin, Michael Tieu, Rachael Larsen, Tamara Casper, Eliza Barkan, Matthew Kroll, Sheana Parry, Nadiya V. Shapovalova, Daniel Hirschstein, Julie Pendergraft, Heather A. Sullivan, Tae Kyung Kim, Aaron Szafer, Nick Dee, Peter Groblewski, Ian Wickersham, Ali Cetin, Julie A. Harris, Boaz P. Levi, Susan M. Sunkin, Linda Madisen, Tanya L. Daigle, Loren Looger, Amy Bernard, John Phillips, Ed Lein, Michael Hawrylycz, Karel Svoboda, Allan R. Jones, Christof Koch, and Hongkui Zeng. Shared and distinct transcriptomic cell types across neocortical areas. Nature, 563(7729):72–78, Nov 2018. URL: https://doi.org/10.1038/s41586-018-0654-5, doi:10.1038/s41586-018-0654-5.

[expWHZ+21]

Xiliang Wang, Yao He, Qiming Zhang, Xianwen Ren, and Zemin Zhang. Direct comparative analyses of 10x genomics chromium and smart-seq2. Genomics, Proteomics & Bioinformatics, 19(2):253–266, 2021. Single-cell Omics Analysis. URL: https://www.sciencedirect.com/science/article/pii/S1672022921000486, doi:https://doi.org/10.1016/j.gpb.2020.02.005.

[expZLL+19] (1,2,3)

Xiannian Zhang, Tianqi Li, Feng Liu, Yaqi Chen, Jiacheng Yao, Zeyao Li, Yanyi Huang, and Jianbin Wang. Comparative analysis of droplet-based ultra-high-throughput single-cell term`rna`-seq systems. Molecular Cell, 73(1):130–142.e5, Jan 2019. URL: https://doi.org/10.1016/j.molcel.2018.10.020, doi:10.1016/j.molcel.2018.10.020.

[expZTB+17]

Grace X. Y. Zheng, Jessica M. Terry, Phillip Belgrader, Paul Ryvkin, Zachary W. Bent, Ryan Wilson, Solongo B. Ziraldo, Tobias D. Wheeler, Geoff P. McDermott, Junjie Zhu, Mark T. Gregory, Joe Shuga, Luz Montesclaros, Jason G. Underwood, Donald A. Masquelier, Stefanie Y. Nishimura, Michael Schnall-Levin, Paul W. Wyatt, Christopher M. Hindson, Rajiv Bharadwaj, Alexander Wong, Kevin D. Ness, Lan W. Beppu, H. Joachim Deeg, Christopher McFarland, Keith R. Loeb, William J. Valente, Nolan G. Ericson, Emily A. Stevens, Jerald P. Radich, Tarjei S. Mikkelsen, Benjamin J. Hindson, and Jason H. Bielas. Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8(1):14049, Jan 2017. URL: https://doi.org/10.1038/ncomms14049, doi:10.1038/ncomms14049.

[expZVP+17]

Christoph Ziegenhain, Beate Vieth, Swati Parekh, Björn Reinius, Amy Guillaumet-Adkins, Martha Smets, Heinrich Leonhardt, Holger Heyn, Ines Hellmann, and Wolfgang Enard. Comparative analysis of Single-Cell term`RNA` sequencing methods. Mol Cell, 65(4):631–643.e4, February 2017.

2.8. Contributors#

We gratefully acknowledge the contributions of:

2.8.1. Authors#

Lukas Heumos

2.8.2. Reviewers#

Yuexin Chen

Single-cell RNA sequencing

Contents

2. Single-cell RNA sequencing#

2.1. The building block of life#

2.2. A brief history of sequencing#

2.2.1. First generation sequencing#

2.2.2. Second generation sequencing#

2.2.3. Third generation sequencing#

2.3. Overview of the NGS process#

2.4. RNA sequencing#

2.5. Single-cell RNA sequencing#

2.5.1. Overview#

2.5.2. Transcript quantification#

2.5.3. Single-cell sequencing protocols#

2.5.3.1. Microfluidic device based protocols#

2.5.3.2. Plate based#

2.5.3.3. Fluidigm C1#

2.5.3.4. Nanopore single-cell transcriptome sequencing#

2.5.3.5. Summary#

2.5.4. single-cell vs single-nuclei#

2.6. Recommended reading#

2.7. References#

2.8. Contributors#

2.8.1. Authors#

2.8.2. Reviewers#