sim_biallelic.Rd
Simulate biallelic data from a simple statistical model. Inputs include the complexity of infection (COI), population-level allele frequencies (PLAF) and some parameters dicating skew and error distributions. Outputs include the phased haplotypes and the un-phased read count and coverage data.
sim_biallelic(COI = 3, PLAF = runif(10, 0, 0.5), coverage = 100, prob_missing = 0.1, alpha = 1, overdispersion = 0, epsilon = 0)
COI | complexity of infection. |
---|---|
PLAF | vector of population-level allele frequencies at each locus. |
coverage | coverage at each locus. If a single value then the same coverage is applied over all loci. |
prob_missing | the probabilty of each locus failing. If a locus fails
then coverage is set to |
alpha | shape parameter of the symmetric Dirichlet prior on strain proportions. |
overdispersion | the extent to which counts are over-dispersed relative
to the binomial distribution. Counts are Beta-binomially distributed, with
the beta distribution having shape parameters |
epsilon | the probability of a single read being mis-called as the other allele. Applies in both directions. |
Simulated data are drawn from a simple statistical model:
Strain proportions are drawn from a symmetric Dirichlet
distribution with shape parameter alpha
.
Phased haplotypes are drawn at every locus, one for each
COI
. The allele at each locus is drawn from a Bernoulli
distribution with probability given by the PLAF
.
The "true" within-sample allele frequency at every locus is
obtained by multiplying haplotypes by their strain proportions, and
summing over haplotypes. Errors are introduced through the equation
$$wsaf_error = wsaf*(1-e) + (1-wsaf)*e$$where \(wsaf\) is the WSAF
without error and \(e\) is the error parameter epsilon
.
Final read counts are drawn from a beta-binomial distribution with
expectation \(w_error\). The raw number of draws is given by the
coverage
, and the skew of the distribution is given by the
overdispersion
parameter. If overdispersion = 0
then the
distribution is binomial, rather than beta-binomial.