GESTALT: Whole-organism Lineage tracing with CRISPR/Cas9 barcodes

Aaron McKenna, Shendure Lab

Graduate student at University of Washington

Editor’s Note:

Understanding how a single cell gives rise to complex multicellular origins is a question that has plagued developmental biologists since the late 19th century. Learning how cells change over time not only has implications for developmental biology, but can also be harnessed to understand how diseases such as cancer evolve. Aaron McKenna and his colleagues from Dr. Jay Shendure’s group and Dr. Alex Schier’s group developed a method called GESTALT to trace cell lineage and recently published their work in Science. Aaron shared with Benchling how their method works and how you can get started using it in your own lab.

Benchling is proud to foster collaboration and promote new methodologies in science. If we could help share your research, let us know.

Multicellular life starts as a single cell. Through a series of coordinated divisions, this cell spawns a set of complex organs and structures in vertebrates. A central question in biology asks how a single cell organizes this impressive feat. Lineage tracing is an attempt to track these cell divisions, tracing the ancestry of individual cells throughout development.

Many technologies have been devised in the last 100 years to trace cell lineage, such as vital dyes, fluorescent proteins, and DNA tags. However, these approaches are limited in that they statically label cells at a distinct point in development, with all progeny cells carrying identical marks. An ideal system would continue recording as development progresses, marking structure within cell populations. This was our goal when designing our approach, Genome Editing of Synthetic Target Arrays for Lineage Tracing (GESTALT)[1].

What is GESTALT?

GESTALT harnesses the power of CRISPR/Cas9 genome editing to capture lineage-specific information. We designed a series of Cas9 target sequences in a compact locus of nine to twelve targets (approximately a 300 base pair segment, figure 1A). This collection of sites, called a cell barcode, is then integrated into the host genome.

As Cas9-mediated insertions and deletions collect at the targets comprising the barcode, the unique combination of barcode marks (insertions and deletions) identify the cell. When that cell then divides, each daughter cell receives one copy of the edited barcode. These ancestor cells can then add their own marks at unedited target sites, adding additional information about relationships between the ancestral populations.

**Figure 1. A GESTALT barcode.** (A) A barcode with ten Cas9 target sites (gray bars), as well as flanking primer sequences (green) is introduced into the genome of interest. (B) A GESTALT barcode from a single cardiomyocyte of the heart. The barcode of this cell has already acquired deletions (red) and insertions (blue) in an ancestor cell, edits which are shared with other related cells. During this cell’s lifetime Cas9 introduces an additional insertion (target 3), a mark that will be passed onto all progeny cells. The pattern of shared edits between many thousands of cells can be used to infer lineage.

How did you choose the sequence content of these target sites?

In our cell culture experiments, we began with a single guide sequence characterized by Tsai et. al.[2] that has a set of known off-target sequences. We used the guide’s matching DNA sequence as the first target in the barcode, and tiled the remaining eight to eleven targets using off-target sequences discovered in that paper. This allowed a single guide, complexed with Cas9, to potentially edit all sites in the barcode.

For our zebrafish work, we wanted to ensure the guide sequences did not interfere with normal development. Therefore, we engineered a series of ten target sequences not present in the zebrafish genome; to maximize barcode editing, we targeted each with a guide that perfectly matched its sequence. Given the challenges of integrating ten guides into the genome, we complexed Cas9 with the guides ahead of time and injected this construct into the zebrafish zygote.

How do you recover GESTALT barcodes?

Once development has progressed to a point you’re interested in surveying, the edited barcode can be recovered from a whole organism by extracting DNA or RNA from tissue. Our goal was to assay the barcode status of single cells, so we attached a unique molecular identifier (UMI) to the 5’ end of the fragment in the first round of PCR amplification. The UMI ensures that after amplicon-sequencing, individual reads can be assigned back to the original cell of origin.

After alignment and processing of sequencing data, a picture emerges of the editing pattern. For example, in figure 2 we show the diversity captured in the circulating blood of an adult zebrafish. Much (>98%) of the blood is made up of five common alleles, whereas the rest is composed of rarer, mostly derivative alleles. The pipeline for processing sequencing reads into single-cell results is available on our github page.

**Figure 2. Alleles recovered from single cells in the circulating blood of an adult zebrafish.** Adapted from [1].

How to reconstruct lineage from GESTALT barcodes?

Our main objective is to use these barcodes to reconstruct lineage, and there are a couple of important points to consider with GESTALT. First, editing is generally irreversible. Once a site is cut and repaired with an insertion or deletion, it can no longer be targeted by the same guide. Secondly, you need to sample many cells to be able to compare marks and infer the relationship between cells. In our experience this has been straightforward: in adult zebrafish we sampled hundreds of thousands of cells on single sequencing runs, resulting in thousands of unique barcode alleles.

Given successful barcode capture, we would then reconstruct cell lineage. To do this, we can define the lineage tree as the simplest arrangement of successive edits from the wild-type (unedited) barcode to our recovered alleles. Using maximum parsimony, or the optimal criterion for generating the shortest tree, we can reconstruct a lineage tree for each sample, as seen in Figure 3. Future improvements to both the barcode design and computational methods will allow reconstruction at increasingly finer scales.

**Figure 3. Reconstruction of the alleles from a single zebrafish embryo using the V6 GESTALT barcode.** Adapted from [1].

What alternative systems are being developed?

A number of recent papers have proposed alternative lineage tracing strategies using genome editing. The first involves similar approaches to GESTALT, such as Junker et. al.[3] and Schmidt et. al.[4], which also rely on single-use targets to accumulate lineage-specific edits. The second approach uses self-targeting (or homing) guide RNAs to record information at a locus (Kalhor et al.[5] and Perli et al.[6]). These systems have the advantage of recording a wider diversity of accumulated edits at a single site by repeatedly editing the same genomic locus. Both approaches use different methods for target design and location and are worth exploring.

All of these approaches, in addition to GESTALT, demonstrate the exciting potential of the CRISPR/Cas9 genome editing for lineage tracing. As these techniques improve so too will our ability to trace the complete lineages of complex living systems in a high-throughput manner.

I want to try it out. How do I start?

The first step would be to integrate a barcode sequence into the organism you’re studying. Our barcode plasmids for GESTALT are available from the Shendure lab’s Addgene page. The sequences of each are also available from Benchling: the plasmids used in cell culture are V1, V2, V3, V4, and V5, and for zebrafish we used the V6 and V7 plasmids. For a more detailed protocol see the supplementary materials section of our paper. The software used to process sequencing data and generate figures for our paper is available on our GitHub page.

For a full explanation of how we developed GESTALT, please see our full manuscript in Sciecne.

References

[1] Aaron Mckenna, Gregory M. Findlay, and James A. Gagnon et al. “Whole-organism lineage tracing by combinatorial and cumulative genome editing.” Science (2016)

[2] Shengdar Tsai et al. “GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases.” Nature Biotechnology (2015)

[3] Jan Philipp Junker et al. “Massively parallel whole-organism lineage tracing using CRISPR/Cas9 induced genetic scars.” bioRxiv (2016)

[4] Jan Philipp Junker et al. “Cell lineage tracing using nuclease barcoding.” arXiv (2016)

[5] Reza Kalhor et al. “Rapidly evolving homing CRISPR barcodes.” bioRxiv (2016)

[6] Samuel Perli et al. “Continuous Genetic Recording with Self-Targeting CRISPR-Cas in Human Cells.” bioRxiv (2016)