Simulate genetic data from the same model used in the MALECOT inference step.

sim_data(n = 100, L = 24, K = 3, data_format = "biallelic",
  pop_col_on = TRUE, alleles = 2, lambda = 1,
  COI_model = "poisson", COI_max = 20, COI_manual = rep(-1, n),
  COI_mean = 3, COI_dispersion = 2, e1 = 0, e2 = 0,
  prop_missing = 0)

Arguments

n

the number of samples

L

the number of loci per sample

K

the number of subpopulations

data_format

whether to produce data in "biallelic" or "multiallelic" format. Note that if biallelic format is chosen then alleles is always set to 2

pop_col_on

TODO

alleles

the number of alleles at each locus. Can be a vector of length L specifying the number of alleles at each locus, or a single scalar value specifying the number of alleles at all loci

lambda

the shape parameter(s) of the prior on allele frequencies. This prior is Beta in the bi-allelic case, and Dirichlet in the multi-allelic case. lambda can be:

  • a single scalar value, in which case the same value is used for every allele and every locus (i.e. the prior is symmetric)

  • a vector of values, in which case the same vector is used for every locus. Only works if the same number of alleles applies at every locus

  • a list of vectors specifying the shape parameter separately for each allele of each locus. The list must of length L, and must contain vectors of length equal to the number of alleles at that locus

COI_model

the distribution from which COIs are drawn. Options include a uniform distribution ("uniform"), a Poisson distribution ("poisson"), or a negative binomial distribution ("nb")

COI_max

the maximum allowed COI. Any COIs that are initially drawn larger than this value are set down to this value

COI_manual

option to override the MCMC and set the COI of one or more samples manually, in which case they are not updated. Vector of length n specifing the integer valued COI of each sample, with -1 indicating that a sample should be estimated

COI_mean

the mean of the distribution from which COIs are drawn. Only applies under the Poisson and negative binomial models (under the uniform model the mean is (COI_max+1)/2 by definition)

COI_dispersion

Only used under the negative binomial model. Defines how much larger the variance is than the mean. Must be > 1

e1

the probability of a true homozygote being incorrectly called as a heterozygote

e2

the probability of a true heterozygote being incorrectly called as a homozygote

prop_missing

the proportion of the data that is missing. Note that data are masked out at random, meaning in some rare cases (and when the proportion of missing data is large) an entire sample or locus can end up being masked out, which will throw an error when loaded into a project

Details

TODO

Examples

# TODO