sim_data.Rd
Simulate genetic data from the same model used in the MALECOT inference step.
sim_data(n = 100, L = 24, K = 3, data_format = "biallelic", pop_col_on = TRUE, alleles = 2, lambda = 1, COI_model = "poisson", COI_max = 20, COI_manual = rep(-1, n), COI_mean = 3, COI_dispersion = 2, e1 = 0, e2 = 0, prop_missing = 0)
n | the number of samples |
---|---|
L | the number of loci per sample |
K | the number of subpopulations |
data_format | whether to produce data in "biallelic" or "multiallelic"
format. Note that if biallelic format is chosen then |
pop_col_on | TODO |
alleles | the number of alleles at each locus. Can be a vector of length
|
lambda | the shape parameter(s) of the prior on allele frequencies. This
prior is Beta in the bi-allelic case, and Dirichlet in the multi-allelic
case.
|
COI_model | the distribution from which COIs are drawn. Options include
a uniform distribution ( |
COI_max | the maximum allowed COI. Any COIs that are initially drawn larger than this value are set down to this value |
COI_manual | option to override the MCMC and set the COI of one or more
samples manually, in which case they are not updated. Vector of length
|
COI_mean | the mean of the distribution from which COIs are drawn. Only
applies under the Poisson and negative binomial models (under the uniform
model the mean is |
COI_dispersion | Only used under the negative binomial model. Defines how much larger the variance is than the mean. Must be > 1 |
e1 | the probability of a true homozygote being incorrectly called as a heterozygote |
e2 | the probability of a true heterozygote being incorrectly called as a homozygote |
prop_missing | the proportion of the data that is missing. Note that data are masked out at random, meaning in some rare cases (and when the proportion of missing data is large) an entire sample or locus can end up being masked out, which will throw an error when loaded into a project |
TODO
# TODO