Parsing 10X Genomics V(D)J data
New
We have an updated tutorial covering the processing of 10x Genomics VDJ data with Change-O and SCOPer. You can also follow the steps below to process 10x VDJ data using methods available in Change-O.
Example data
10X Genomics provides an example data set of Ig V(D)J processed by the Cell Ranger pipeline, which is available for download from their Single Cell Immune Profiling support site.
Converting 10X V(D)J data into the AIRR Community standardized format
To process 10X V(D)J data, a combination of AssignGenes.py and MakeDb.py
can be used to generate a TSV file compliant with the
AIRR Community Rearrangement
schema that incorporates annotation information provided by the Cell Ranger pipeline. The
--10x filtered_contig_annotations.csv
specifies the path of the contig annotations file generated by cellranger vdj
,
which can be found in the outs
directory.
Generate AIRR Rearrangement data from the 10X V(D)J FASTA files using the steps below:
AssignGenes.py igblast -s filtered_contig.fasta -b ~/share/igblast \
--organism human --loci ig --format blast
MakeDb.py igblast -i filtered_contig_igblast.fmt7 -s filtered_contig.fasta \
-r IMGT_Human_*.fasta --10x filtered_contig_annotations.csv --extended
all_contig.fasta
can be exchanged for filtered_contig.fasta
, and
all_contig_annotations.csv
can be exchanged for filtered_contig_annotations.csv
.
Warning
The resulting table overwrites the V, D and J gene assignments generated by Cell Ranger and uses those generated by IgBLAST or IMGT/HighV-QUEST instead.
See also
To process mouse data and/or TCR data alter the --organism
and --loci
arguments to AssignGenes.py accordingly
(e.g., --organism mouse
,
--loci tcr
) and use the appropriate V, D and J IMGT
reference databases (e.g., IMGT_Mouse_TR*.fasta
)
See the IgBLAST usage guide for further details regarding the setup and use of IgBLAST with Change-O.
Identifying clones from B cells in AIRR formatted 10X V(D)J data
Splitting into separate light and heavy chain files
To group B cells into clones from AIRR Rearrangement data, the output from MakeDb.py must be parsed into a light chain file and a heavy chain file:
ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IGH" \
--logic all --regex --outname heavy
ParseDb.py select -d 10x_igblast_db-pass.tsv -f locus -u "IG[LK]" \
--logic all --regex --outname light
Assign clonal groups to the heavy chain data
The heavy chain file must then be clonally clustered separately. See Clustering sequences into clonal groups for how to use DefineClones.py to assign clonal cluster annotations to the IGH file.
Correct clonal groups based on light chain data
DefineClones.py currently does not support light chain cloning. However,
cloning can be performed after heavy chain cloning using
light_cluster.py
provided on the Immcantation Bitbucket repository
in the scripts
directory:
light_cluster.py -d heavy_select-pass_clone-pass.tsv -e light_select-pass.tsv \
-o 10X_clone-pass.tsv
Here, heavy_select-pass_clone-pass.tsv
refers to the cloned heavy chain
AIRR Rearrangement file, light_select-pass.tsv
refers to the light chain file,
and 10X_clone-pass.tsv
is the resulting output file.
The algorithm will (1) remove cells associated with more than one heavy chain and (2) correct heavy chain clone definitions based on an analysis of the light chain partners associated with the heavy chain clone.
Note
By default, light_chain.py
expects the
AIRR Rearrangement columns:
v_call
j_call
junction_length
umi_count
cell_id
clone_id
To process legacy Change-O formatted data add the --format changeo
argument:
light_cluster.py -d heavy_select-pass_clone-pass.tab -e light_select-pass.tab \
-o 10X_clone-pass.tab --format changeo
Which expects the following Change-O columns:
V_CALL
J_CALL
JUNCTION_LENGTH
UMICOUNT
CELL
CLONE