37. Annotation#

37.1. Motivation#

Similar to RNA data, it is possible to annotate the ADT data based on surface protein markers. This can be very beneficial for the annotation of cells that are difficult to annotate in the RNA space such as immune cells. For example, although sequenced immune cells usually include CD45 cells, the CD45 gene is not always highly expressed and additional marker genes are used for the annotation. Even with the additional information these other markers provide, this approach may still lack resolution and power. This can be mitigated by annotating (additionally) on the ADT level.

The general annotation workflow makes use of the same functions as for RNA and no ADT specific functions are required.

37.2. Manual annotation#

import scanpy as sc

# setting visualization parameters
sc.settings.verbosity = 0
sc.settings.set_figure_params(
    dpi=80,
    facecolor="white",
    frameon=False,
)
adata = sc.read(
    "cite_preprocessed.h5ad",
    backup_url="https://figshare.com/ndownloader/files/42569866",
)

First, we check the expression of CD45. CD45 is one of the most abundant proteins in the T-cell plasma membrane and required for TCR signaling. It activates Lck, which in turn is required to phosphorylate the TCR complex [Courtney et al., 2019].

sc.pl.umap(adata, frameon=False, color="CD45")
../_images/7824c64bc118032641f6ce2713c73e3bb1171c9a17880f50ee2b17aa177dc50e.png

The measured ADTs use a slightly different nomenclature due to name clashes with RNA genes. The var_names_make_unique function was used to separate gene names from protein names and the proteins might have -1 suffixes. We look up an example gene name (CD38) to exemplary find the exact nomenclature in our variable names:

adata.var[adata.var.gene_ids.str.contains("CD38")]
gene_ids feature_types n_cells_by_counts mean_counts log1p_mean_counts pct_dropout_by_counts total_counts log1p_total_counts
CD38-1 CD38 Antibody Capture 116434 52.892693 3.986995 4.57481 6453755.0 15.680173

We cluster the cells with a relatively low resolution to start with and similarly to RNA annotation, it is possible to increase the resolution for more fine-grained annotations.

sc.tl.leiden(adata, resolution=0.3)
sc.tl.rank_genes_groups(adata, groupby="leiden")

To check which surface markers are present in which cell type, we use the scanpy rank genes groups function. We can already identify clusters 1,2 and 11 as T cell populations by CD3 expression, and cluster 4 as B cells by CD19 expression.

sc.pl.rank_genes_groups_dotplot(adata, n_genes=3, values_to_plot="logfoldchanges")
../_images/febebe800e05b5c2c783f1b746348c45aba4d68caca627534418699381fe9431.png
sc.pl.umap(adata, color="leiden")
../_images/4d0ebea7403ffd74495c420b1f64b09196934b5bd5a12a11dabb87b5812244c6.png

We’ll check a few known markers for major immune cell types.

# B cells
sc.pl.umap(adata, frameon=False, color=["CD19-1"])
../_images/d3869cfa47ef85c7fbe2ce0f39065a00d2423fa7411d3ac997c7320a08c32228.png

As could be seen in the dotplot, cluster 4 expresses CD19 which is a B cell marker while 1 and 2 express CD3, which is a T cell marker.

Let’s look into the T cells in more detail and separate them into CD4 and CD8 cells.

# T cells
sc.pl.umap(adata, color=["CD3", "CD4-1", "CD8"])
../_images/76ac3508ffa7cdb271523e55f180c021c72c396a36f425f1c4d3c2c94aab46a2.png
# NKT cells are CD3+ and CD56+
# NK cells are CD3- and CD56+
sc.pl.umap(adata, color=["CD56"], frameon=False)
../_images/82be0589e460ef007547e7be21e2e8c6b88958b8aef444280ffd29b81c325f78.png
# Myeloid cells

sc.pl.umap(adata, color=["CD11b"], frameon=False)
sc.pl.umap(adata, color=["CD14-1"], frameon=False)
../_images/a70fee0a33dd9391d0e97b6c6b786e45ed25e2b5a143f2b3838bbeeda271a963.png ../_images/31a9407c1f702f5d64ddf44c821b489aecdefb301851e063eddfee5f15a1e05b.png
# Dendridic
sc.pl.umap(adata, color=["CD11c"], frameon=False)
../_images/8a783dc74f34fe5dff99c1fbfd21a5e0593f4fbcad3f58c39c60150b54f38cef.png
# Neutrophil
sc.pl.umap(adata, color=["CD16", "CD32"], frameon=False)
../_images/1361ef65496cd478fc40858e71b2a177bc1143fe8ead0aa61d8ca5382dbbb20c.png
adata
AnnData object with n_obs × n_vars = 120502 × 136
    obs: 'donor', 'batch', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'n_counts', 'outliers', 'leiden'
    var: 'gene_ids', 'feature_types', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'
    uns: 'batch_colors', 'donor_colors', 'neighbors', 'pca', 'umap', 'leiden', 'rank_genes_groups', 'dendrogram_leiden', 'leiden_colors'
    obsm: 'X_isotypes', 'X_pca', 'X_pcahm', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'connectivities', 'distances'

37.3. Automated annotation#

It is technically possible to use cell type classifiers trained on ADT data and to map against ADT reference datasets. However, ADT specific methods are sparse if not non-existent, and we refer to the RNA annotation chapter for methodological details.

37.4. References#

[CSL+19]

Adam H Courtney, Alexey A Shvets, Wen Lu, Gloria Griffante, Marianne Mollenauer, Veronika Horkova, Wan-Lin Lo, Steven Yu, Ondrej Stepanek, Arup K Chakraborty, and Arthur Weiss. Cd45 functions as a signaling gatekeeper in t cells. Sci. Signal., 12(604):eaaw8151, October 2019.

37.5. Contributors#

We gratefully acknowledge the contributions of:

37.5.1. Authors#

  • Daniel Strobl

  • Ciro Ramírez-Suástegui

37.5.2. Reviewers#

  • Lukas Heumos

  • Anna Schaar