Parsing 10X Genomics V(D)J data

New

We have an updated tutorial covering the processing of 10x Genomics VDJ data with Change-O and SCOPer. You can also follow the steps below to process 10x VDJ data using methods available in Change-O.

Example data

10X Genomics provides an example data set of Ig V(D)J processed by the Cell Ranger pipeline, which is available for download from their Single Cell Immune Profiling support site.

Converting 10X V(D)J data into the AIRR Community standardized format

To process 10X V(D)J data, a combination of AssignGenes.py and MakeDb.py can be used to generate a TSV file compliant with the AIRR Community Rearrangement schema that incorporates annotation information provided by the Cell Ranger pipeline. The --10x filtered_contig_annotations.csv specifies the path of the contig annotations file generated by cellranger vdj, which can be found in the outs directory.

Generate AIRR Rearrangement data from the 10X V(D)J FASTA files using the steps below:

AssignGenes.py igblast -s filtered_contig.fasta -b ~/share/igblast \
   --organism human --loci ig --format blast
MakeDb.py igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta \
   -r IMGT_Human_*.fasta --10x filtered_contig_annotations.csv --extended

all_contig.fasta can be exchanged for filtered_contig.fasta, and all_contig_annotations.csv can be exchanged for filtered_contig_annotations.csv.

Warning

The resulting table overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST or IMGT/HighV-QUEST instead.

Identifying clones from B cells in AIRR formatted 10X V(D)J data

Splitting into separate light and heavy chain files

To group B cells into clones from AIRR Rearrangement data, the output from MakeDb.py must be parsed into a light chain file and a heavy chain file:

ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IGH" \
        --logic all --regex --outname heavy
ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IG[LK]" \
        --logic all --regex --outname light

Assign clonal groups to the heavy chain data

The heavy chain file must then be clonally clustered separately. See Clustering sequences into clonal groups for how to use DefineClones.py to assign clonal cluster annotations to the IGH file.

Correct clonal groups based on light chain data

DefineClones.py currently does not support light chain cloning. However, cloning can be performed after heavy chain cloning using light_cluster.py provided on the Immcantation Bitbucket repository in the scripts directory:

light_cluster.py -d heavy_select-pass_clone-pass.tsv -e light_select-pass.tsv \
        -o 10X_clone-pass.tsv

Here, heavy_select-pass_clone-pass.tsv refers to the cloned heavy chain AIRR Rearrangement file, light_select-pass.tsv refers to the light chain file, and 10X_clone-pass.tsv is the resulting output file.

The algorithm will (1) remove cells associated with more than one heavy chain and (2) correct heavy chain clone definitions based on an analysis of the light chain partners associated with the heavy chain clone.

Note

By default, light_chain.py expects the AIRR Rearrangement columns:

v_call
j_call
junction_length
umi_count
cell_id
clone_id

To process legacy Change-O formatted data add the --format changeo argument:

light_cluster.py -d heavy_select-pass_clone-pass.tab -e light_select-pass.tab \
    -o 10X_clone-pass.tab --format changeo

Which expects the following Change-O columns:

V_CALL
J_CALL
JUNCTION_LENGTH
UMICOUNT
CELL
CLONE