Package 'immuneSIM' reference manual

Title:	Tunable Simulation of B- And T-Cell Receptor Repertoires
Description:	Simulate full B-cell and T-cell receptor repertoires using an in silico recombination process that includes a wide variety of tunable parameters to introduce noise and biases. Additional post-simulation modification functions allow the user to implant motifs or codon biases as well as remodeling sequence similarity architecture. The output repertoires contain records of all relevant repertoire dimensions and can be analyzed using provided repertoire analysis functions. Preprint is available at bioRxiv (Weber et al., 2019 <doi:10.1101/759795>).
Authors:	Cédric R. Weber [aut, cre], Victor Greiff [aut]
Maintainer:	Cédric R. Weber <[email protected]>
License:	GPL-3
Version:	0.8.7
Built:	2025-02-19 06:30:52 UTC
Source:	https://github.com/greifflab/immunesim

Replaces codons with synonymous codons

Description

Replaces codons with synonymous codons

Usage

codon_replacement(repertoire, mode = "both", codon_replacement_list,
  skip_probability = 0)
codon_replacement(repertoire, mode = "both", codon_replacement_list,
  skip_probability = 0)

Arguments

`repertoire`	An annotated AIRR compliant immuneSIM repertoire. (http://docs.airr-community.org/en/latest/)
`mode`	Defines whether codons should be replaced in the nt or AA sequence or in both ("nt","AA","both")
`codon_replacement_list`	List containing instructions for which codons should be replaced and how
`skip_probability`	Probability with which a sequence gets skipped in the codon replacement process between 0,1

Value

immuneSIM repertoire with replaced codons

Examples

repertoire <- list_example_repertoires[["example_repertoire_A"]]
rep_codon_repl <- codon_replacement(repertoire, "both",
list(tat = "tac", agt = "agc", gtt = "gtg"), 0)
repertoire <- list_example_repertoires[["example_repertoire_A"]]
rep_codon_repl <- codon_replacement(repertoire, "both",
list(tat = "tac", agt = "agc", gtt = "gtg"), 0)

Decodes immuneSIM repertoire codon replacements events.

Description

Decodes immuneSIM repertoire codon replacements events.

Usage

codon_replacement_reconstruction(codon_replacement_vec)
codon_replacement_reconstruction(codon_replacement_vec)

Arguments

codon_replacement_vec

An vector containing strings describing codon replacement events as generated by codon_replacement() function. The string contains information on every replacement event in the form:

"initial_codon:replacement_codon:number_of_occurrences"

which is combined into: "Replacement1|Replacement2|Replacement3".

(For example: "tac,tat:3|agc,agt:1|gtg,gtt:0".)

Value

List of dataframes. Each entry contains replacement info including count of occurrences for each simulated sequence.

Examples

codon_replacement_example <- c("tat,tac:3|agt,agc:3|gtt,gtg:0", "tat,tac:1|agt,agc:1|gtt,gtg:1")
codon_replacement_list <- codon_replacement_reconstruction(codon_replacement_example)
codon_replacement_example <- c("tat,tac:3|agt,agc:3|gtt,gtg:0", "tat,tac:1|agt,agc:1|gtt,gtg:1")
codon_replacement_list <- codon_replacement_reconstruction(codon_replacement_example)

Generates a dataframe from separate heavy and light or beta and alpha chain dataframes

Description

Generates a dataframe from separate heavy and light or beta and alpha chain dataframes

Usage

combine_into_paired(repertoire_heavy, repertoire_light)
combine_into_paired(repertoire_heavy, repertoire_light)

Arguments

`repertoire_heavy`	A repertoire containing heavy/beta chain data
`repertoire_light`	A repertoire containing light/alpha chain data

Value

immuneSIM repertoire containing heavy/beta and light/alpha chain data.

Examples

repertoire_heavy <- immuneSIM(number_of_seqs = 5,species = "mm",receptor = "ig", chain = "h")
repertoire_light <- immuneSIM(number_of_seqs = 5,species = "mm",receptor = "ig", chain = "kl")
paired_repertoire <- combine_into_paired(repertoire_heavy,repertoire_light)
repertoire_heavy <- immuneSIM(number_of_seqs = 5,species = "mm",receptor = "ig", chain = "h")
repertoire_light <- immuneSIM(number_of_seqs = 5,species = "mm",receptor = "ig", chain = "kl")
paired_repertoire <- combine_into_paired(repertoire_heavy,repertoire_light)

Translation dictionary amino acid <-> nucleotide codon

Description

A dataframe containing a mapping from each of 64 codons to amino acids.

Usage

gen_code
gen_code

Format

A data frame with 64 rows and variables:

aa: amino acid
codon: nucleotide codon

Source

https://www.genscript.com/tools/codon-table

Hotspot dataframe for SHM

Description

A dataframe containing mutation probabilities for every possible 5mer pattern

Usage

hotspot_df
hotspot_df

Format

A data frame with 1024 rows and variables:

pattern: amino acid
toA: probability of mutation to adenine
toC: probability of mutation to cytosine
toG: probability of mutation to guanine
toT: probability of mutation to thymine
Source: source of probability

Source

https://cran.r-project.org/package=AbSim

Deletes top hub sequences from repertoire, changing the network architecture.

Description

Deletes top hub sequences from repertoire, changing the network architecture.

Usage

hub_seqs_exclusion(repertoire, top_x = 0.005, report = FALSE,
  output_dir = "", verbose = TRUE)
hub_seqs_exclusion(repertoire, top_x = 0.005, report = FALSE,
  output_dir = "", verbose = TRUE)

Arguments

`repertoire`	An annotated AIRR compliant repertoire. (http://docs.airr-community.org/en/latest/)
`top_x`	Determines what percentage of hub sequences get excluded (Default: 0.005, i.e. Top 0.5 percent)
`report`	The user can choose to output a report csv file containing the excluded sequences. (Default: FALSE)
`output_dir`	If user specifies and output directory a csv file containing the excluded sequences is saved at that path, otherwise it will be saved in tempdir().
`verbose`	Determines whether messages on plot locations are output to user. (Default: TRUE)

Value

Repertoire reduced by hub sequence (new network architecture)

Examples

repertoire <- list_example_repertoires[["example_repertoire_A"]]
rep_excluded_hubs <- hub_seqs_exclusion(repertoire, top_x = 0.005, output_dir = "")
repertoire <- list_example_repertoires[["example_repertoire_A"]]
rep_excluded_hubs <- hub_seqs_exclusion(repertoire, top_x = 0.005, output_dir = "")

Simulates an immune repertoire based on user-defined parameters

Description

Simulates an immune repertoire based on user-defined parameters

Usage

immuneSIM(number_of_seqs = 1000,
  vdj_list = list_germline_genes_allele_01, species = "mm",
  receptor = "ig", chain = "h",
  insertions_and_deletion_lengths = insertions_and_deletion_lengths_df,
  user_defined_alpha = 2, name_repertoire = "sim_rep",
  length_distribution_rand = length_dist_simulation, random = FALSE,
  shm.mode = "none", shm.prob = 15/350, vdj_noise = 0,
  vdj_dropout = c(V = 0, D = 0, J = 0), ins_del_dropout = c(""),
  equal_cc = FALSE, freq_update_time = round(0.5 * number_of_seqs),
  max_cdr3_length = 100, min_cdr3_length = 6, verbose = TRUE,
  airr_compliant = TRUE)
immuneSIM(number_of_seqs = 1000,
  vdj_list = list_germline_genes_allele_01, species = "mm",
  receptor = "ig", chain = "h",
  insertions_and_deletion_lengths = insertions_and_deletion_lengths_df,
  user_defined_alpha = 2, name_repertoire = "sim_rep",
  length_distribution_rand = length_dist_simulation, random = FALSE,
  shm.mode = "none", shm.prob = 15/350, vdj_noise = 0,
  vdj_dropout = c(V = 0, D = 0, J = 0), ins_del_dropout = c(""),
  equal_cc = FALSE, freq_update_time = round(0.5 * number_of_seqs),
  max_cdr3_length = 100, min_cdr3_length = 6, verbose = TRUE,
  airr_compliant = TRUE)

Arguments

`number_of_seqs`	Integer defining the number of sequences that should be simulated
`vdj_list`	List containing germline genes and their frequencies
`species`	String defining species for which repertoire should be simulated ("mm": mouse, "hs": human. Default: "mm").
`receptor`	String defining receptor type ("ig" or "tr". Default: "ig")
`chain`	String defining chain (for ig: "h","k","l", for tr: "b" or "a". Default: "h")
`insertions_and_deletion_lengths`	Data.frame containing np1, np2 sequences as well as deletion lengths. (Pooled from murine repertoire data, Greiff,2017) Note: This is a subset of 500000 observations of the dataframe used in the paper. The full dataframe which can be introduced here can be found on: (Git-Link)
`user_defined_alpha`	Numeric. Scaling parameter used for the simulation of powerlaw distribution (recommended range 2-5. Default: 2, https://en.wikipedia.org/wiki/Power_law)
`name_repertoire`	String defining chosen repertoire name recorded in the name_repertoire column of the output for identification.
`length_distribution_rand`	Vector containing lengths of immune receptor sequences based on immune repertoire data (Greiff, 2017).
`random`	Boolean. If TRUE repertoire will consist of fully random sequences, independent of germline genes.
`shm.mode`	String defining mode of somatic hypermutation simulation based on AbSim (options: 'none', 'data','poisson', 'naive', 'motif', 'wrc'. Default: 'none'). See AbSim documentation.
`shm.prob`	Numeric defining probability of a SHM (somatic hypermutation) occurring at each position.
`vdj_noise`	Numeric between 0,1, setting noise level to be introduced in provided V,D,J germline frequencies. 0 denotes no noise. (Default: 0)
`vdj_dropout`	Named vector containing entries V,D,J setting the number of germline genes to be dropped out. (Default: c("V"=0,"D"=0,"J"=0))
`ins_del_dropout`	String determining whether insertions and deletions should occur. Options: "", "no_insertions", "no_insertions_n1", "no_insertions_n2", "no_deletions_v", "no_deletions_d_5", "no_deletions_d_3", "no_deletions_j", "no_deletions_vd", "no_deletions". Default: "")
`equal_cc`	Boolean that if set TRUE will override user_defined_alpha and generate a clone count distribution that is equal for all sequences. Default: FALSE.
`freq_update_time`	Numeric determining whether simulated VDJ frequencies agree with input after set amount of sequences to correct for VDJ bias. Default: Update after 50 percent of sequences.
`max_cdr3_length`	Numeric defining maximal length of cdr3. (Default: 100)
`min_cdr3_length`	Numeric defining minimal length of cdr3. (Default: 6)
`verbose`	Boolean toggling printing of progress on and off (Default: FALSE)
`airr_compliant`	Boolean determining whether output repertoire should be named in an AIRR compliant manner (Default: TRUE). (http://docs.airr-community.org/en/latest/)

Value

An annotated AIRR-compliant immuneSIM repertoire. (http://docs.airr-community.org/en/latest/)

Examples

sim_rep <- immuneSIM(number_of_seqs = 10, vdj_list = list_germline_genes_allele_01,
species = "mm", receptor = "ig", chain = "h",
insertions_and_deletion_lengths = insertions_and_deletion_lengths_df,
user_defined_alpha = 2,name_repertoire = "mm_igh_sim",
shm.mode = "data",shm.prob=15/350,vdj_noise = 0, vdj_dropout = c(V=0,D=0,J=0),
ins_del_dropout = "",min_cdr3_length = 6)
sim_rep <- immuneSIM(number_of_seqs = 10, vdj_list = list_germline_genes_allele_01,
species = "mm", receptor = "ig", chain = "h",
insertions_and_deletion_lengths = insertions_and_deletion_lengths_df,
user_defined_alpha = 2,name_repertoire = "mm_igh_sim",
shm.mode = "data",shm.prob=15/350,vdj_noise = 0, vdj_dropout = c(V=0,D=0,J=0),
ins_del_dropout = "",min_cdr3_length = 6)

Dataframe containing insertion sequences and deletion lengths

Description

A dataframe containing all insertions and deletions observed in experimental data (pooled across all samples, Greiff, 2017) This dataframe is a subset of the dataframe used in the application note. The original dataframe which contains 11363603 rows can be downloaded from:

Usage

insertions_and_deletion_lengths_df
insertions_and_deletion_lengths_df

Format

A data frame with 500000 rows and variables:

n1: np1 insertions
n2: np2 insertions
del_v: lengths of V gene deletions
del_d_5: lengths of 5' end D gene deletions
del_d_3: lengths of 3' end D gene deletions
del_j: lengths of J gene deletions

Details

https://github.com/GreiffLab/immuneSIM or using the provided function: load_insdel_data()

Source

https://doi.org/10.1016/j.celrep.2017.04.054

Vector containing VDJ length distributions

Description

A vector containing 10000 VDJ lengths for simulating of fully random sequences (independent of germline genes)

Usage

length_dist_simulation
length_dist_simulation

Format

A vector with 10000 entries:

length: VDJ nucleotide lengths sampled from murine naive follicular B-cell data, Greiff 2017

Source

https://doi.org/10.1016/j.celrep.2017.04.054

Example repertoires

Description

A list containing two example repertoires (100 sequences each) simulated with immuneSIM using default parameters. These repertoires are used in the examples.

Usage

list_example_repertoires
list_example_repertoires

Format

A list with 2 entries:

example_repertoire_A: Repertoire simulated using standard parameters (A)
example_repertoire_A: Repertoire simulated using standard parameters (B)

Source

https://immunesim.readthedocs.io

Collection of germline genes and frequencies

Description

A list containing sublists for species ("hs","mm") which in turn contain sublists for receptors ("ig","tr") which are subset in chains ("h", "k", "l" and "b", "a", respectively). Each entry contains a list of three dataframes ("V","D" and "J") with the major IMGT annotated germline genes including name, sequence based on IMGT and frequencies based on experimental data from DeWitt(2017), Emerson (2017), Greiff (2017) and Madi (2017)

Usage

list_germline_genes_allele_01
list_germline_genes_allele_01

Format

A list of lists containing dataframes with up to 126 entries:

gene: name of germline gene
allele: allele number (presently restricted to allele 01)
sequence: nucleotide sequence of germline gene
species: name of species
frequency: Frequencies of germline genes based on experimental data

Source

http://www.imgt.org/vquest/refseqh.html

https://doi.org/10.1371/journal.pone.0160853

https://doi.org/10.1038/ng.3822

https://doi.org/10.1016/j.celrep.2017.04.054

https://doi.org/10.7554/eLife.22057

Loads full insertion/deletion data from GitHub

Description

Loads full insertion/deletion data from GitHub

Usage

load_insdel_data()
load_insdel_data()

Value

Dataframe containing insertions and deletions (11363603 rows, 6 columns)

Examples


full_insertions_and_deletion_df <- load_insdel_data()

full_insertions_and_deletion_df <- load_insdel_data()

Implant random or predefined motifs into CDR3

Description

Implant random or predefined motifs into CDR3

Usage

motif_implantation(sim_repertoire, motif, fixed_pos = 0)
motif_implantation(sim_repertoire, motif, fixed_pos = 0)

Arguments

`sim_repertoire`	An annotated AIRR compliant immuneSIM repertoire.
`motif`	Either a list that contains number, length and frequencies of motifs or dataframe that contains predefined motifs and their frequencies
`fixed_pos`	defines position at which motif is to be introduced. if 0 motif will be introduced at random position

Value

Repertoire with modified sequences containing implanted motifs in CDR3.

Examples

sim_repertoire <- list_example_repertoires[["example_repertoire_A"]]
sim_rep_motifs <- motif_implantation(sim_repertoire,list("n"=2,"k"=3,"freq"=c(0.1,0.1)),0)
sim_repertoire <- list_example_repertoires[["example_repertoire_A"]]
sim_rep_motifs <- motif_implantation(sim_repertoire,list("n"=2,"k"=3,"freq"=c(0.1,0.1)),0)

One Spot

Description

A dataframe containing a mutation probabilities to base per 5mer (inherited from AbSim package)

Usage

one_spot_df
one_spot_df

Format

A dataframe with 32 entries:

pattern: amino acid
toA: probability of mutation to adenine
toC: probability of mutation to cytosine
toG: probability of mutation to guanine
toT: probability of mutation to thymine
Source: source of probability

Source

https://cran.r-project.org/package=AbSim

https://doi.org/10.1093/bioinformatics/btx533

Comparative plots of main repertoire features of two input repertoires (length distribution, amino acid frequency, VDJ usage, kmer occurrence)

Description

Comparative plots of main repertoire features of two input repertoires (length distribution, amino acid frequency, VDJ usage, kmer occurrence)

Usage

plot_repertoire_A_vs_B(repertoire_A, repertoire_B,
  names_repertoires = c("Repertoire_A", "Repertoire_B"),
  length_aa_plot = 14, output_dir = "", verbose = TRUE)
plot_repertoire_A_vs_B(repertoire_A, repertoire_B,
  names_repertoires = c("Repertoire_A", "Repertoire_B"),
  length_aa_plot = 14, output_dir = "", verbose = TRUE)

Arguments

`repertoire_A`	An annotated AIRR-compliant immuneSIM repertoire. (http://docs.airr-community.org/en/latest/)
`repertoire_B`	An annotated AIRR-compliant immuneSIM repertoire.
`names_repertoires`	A vector containing two strings denoting the names of the repertoires / repertoire descriptions.
`length_aa_plot`	Defines sequence length for which the amino acid frequency plot will be made.
`output_dir`	String containing full path of desired output folder. If empty, figures will be output in tempdir().
`verbose`	Determines whether messages on plot locations are output to user. (Default: TRUE)

Value

TRUE (plots saved as pdfs into subfolder 'figures')

Examples

repertoire_A <- list_example_repertoires[["example_repertoire_A"]]
repertoire_B <- list_example_repertoires[["example_repertoire_B"]]
plot_repertoire_A_vs_B(
repertoire_A,
repertoire_B,
c("Sim_repertoire_1","Sim_repertoire_2"),
length_aa_plot = 14,
output_dir="")
repertoire_A <- list_example_repertoires[["example_repertoire_A"]]
repertoire_B <- list_example_repertoires[["example_repertoire_B"]]
plot_repertoire_A_vs_B(
repertoire_A,
repertoire_B,
c("Sim_repertoire_1","Sim_repertoire_2"),
length_aa_plot = 14,
output_dir="")

Plots main repertoire features (length distribution,amino acid frequencies and VDJ usage)

Description

Plots main repertoire features (length distribution,amino acid frequencies and VDJ usage)

Usage

plot_report_repertoire(repertoire, output_dir = "", verbose = TRUE)
plot_report_repertoire(repertoire, output_dir = "", verbose = TRUE)

Arguments

repertoire

An annotated AIRR-compliant immuneSIM repertoire.

(http://docs.airr-community.org/en/latest/)

output_dir

String containing full path of desired output folder. If empty figures will be output in tempdir().

verbose

Determines whether messages on plot locations are output to user. (Default: TRUE)

Value

TRUE (plots saved as pdfs into subfolder 'figures')

Examples

repertoire <- list_example_repertoires[["example_repertoire_A"]]
plot_report_repertoire(repertoire,output_dir="")
repertoire <- list_example_repertoires[["example_repertoire_A"]]
plot_report_repertoire(repertoire,output_dir="")

Decodes immuneSIM repertoire shm_events column.

Description

Decodes immuneSIM repertoire shm_events column.

Usage

shm_event_reconstruction(shm_event_vec)
shm_event_reconstruction(shm_event_vec)

Arguments

shm_event_vec

An vector containing strings describing SHM events as output in shm_events column of immuneSIM repertoires. The string contains information on every mutation event in the form:

"Position:pre_mutation_nucleotide,post_mutation_nucleotide" combined as: "Mutation1|Mutation2|Mutation3". For example: "171:t,a|186:g,a".

Value

List of dataframes. Each entry contains location and shm mutation info for a simulated sequence

Examples

shm_events_example<-c("171:t,a|186:g,a|287:g,a|310:t,c","","294:c,g|316:t,c|330:c,t")
shm_list<-shm_event_reconstruction(shm_events_example)
shm_events_example<-c("171:t,a|186:g,a|287:g,a|310:t,c","","294:c,g|316:t,c|330:c,t")
shm_list<-shm_event_reconstruction(shm_events_example)

Package 'immuneSIM'

Help Index

Replaces codons with synonymous codons

Description

Usage

Arguments

Value

Examples

Decodes immuneSIM repertoire codon replacements events.

Description

Usage

Arguments

Value

Examples

Generates a dataframe from separate heavy and light or beta and alpha chain dataframes

Description

Usage

Arguments

Value

Examples

Translation dictionary amino acid <-> nucleotide codon

Description

Usage

Format

Source

Hotspot dataframe for SHM

Description

Usage

Format

Source

Deletes top hub sequences from repertoire, changing the network architecture.

Description

Usage

Arguments

Value

Examples

Simulates an immune repertoire based on user-defined parameters

Description

Usage

Arguments

Value

Examples

Dataframe containing insertion sequences and deletion lengths

Description

Usage

Format

Details

Source

Vector containing VDJ length distributions

Description

Usage

Format

Source

Example repertoires

Description

Usage

Format

Source

Collection of germline genes and frequencies

Description

Usage

Format

Source

Loads full insertion/deletion data from GitHub

Description

Usage

Value

Examples

Implant random or predefined motifs into CDR3

Description

Usage

Arguments

Value

Examples

One Spot

Description

Usage

Format

Source

Comparative plots of main repertoire features of two input repertoires (length distribution, amino acid frequency, VDJ usage, kmer occurrence)