
Compte rendu Stage M2
1 Apr, 2026

- Restricted number of loci
- Low error rate
- Low cost

- low coverage (<3X)
- Missing data
- Low cost

- Exhaustive
- Expensive
- Low number of samples
Low-pass \(\rightarrow\) Requires imputation algorithms

Penetrance: Distribution de la vraisemblance de chaque génotypes
| \(u_i\) | \(0\) | \(1\) | \(2\) |
|---|---|---|---|
| \(P(D|G)\) | 0.01 | 0.23 | 0.81 |
Dans un .vcf on utilise le “Phredscore Likelihood” (PL)
\[ PL = -10 * \log{P(D | G)} \]

Pedigree

\[ \textcolor{orange}{Pr(u_i|y)} \propto \textcolor{red}{a_i(u_i)}\textcolor{green}{g(y_i|u_i)}\textcolor{blue}{\prod p_{ij}(u_i)} \]
Avec:
fonction Posterior
\[ \begin{align} \textcolor{blue}{p_{ij}(u_i)} = \sum_{u_j}\Bigg[\textcolor{red}{a_j(u_j)}\textcolor{green}{g(y_j|u_j)}\textcolor{blue}{\prod_{\stackrel{k\in S_j}{k\ne i}}p_{jk}(u_j)} \\ \times \prod_{k\in C_{ij}}\Bigg[\sum_{u_k}\textcolor{magenta}{tr(u_k|u_i ,u_j)}\textcolor{green}{g(y_k|u_k)} \textcolor{blue}{\prod_{l \in S_k}p_{kl}(u_k)}\Bigg]\Bigg] \end{align} \]
fonction Anterior
\[ \begin{align} \textcolor{red}{a_i(u_i)} =\sum_{u_m}\Bigg\{ \textcolor{red}{a_m(u_m)}\textcolor{green}{g(y_m|u_m)}\textcolor{blue}{\prod_{\stackrel{j\in S_m}{j\ne f}}p_{mj}(u_m)} \\ \times \sum_{u_f}\Big\{\textcolor{red}{a_f(u_f)}\textcolor{green}{g(y_f|u_f)}\textcolor{blue}{\prod_{\stackrel{j\in S_f}{j\ne m}}p_{fj}(u_f)}\\ \times \textcolor{magenta}{tr(u_i|u_m,u_f)} \\ \times \prod_{\stackrel{j\in C_mf}{j\ne i}}\Big[\sum_{u_j}\textcolor{magenta}{tr(u_j|u_m ,u_f)}\textcolor{green}{g(y_i|u_i)} \textcolor{blue}{\prod_{k \in S_i}p_{kj}(u_k)}\Big]\Big\}\Bigg\} \end{align} \]
Boucles récursivité impossible



| Observed \(y_i\) | Formula |
|---|---|
| 0 (AA) | \((1-\epsilon)^2\) |
| 1 (Aa) | \(2 \cdot (1-\epsilon) \cdot \epsilon\) |
| 2 (aa) | \(\epsilon^2\) |
| NA | \(1\) |
Problème: \(a_i(u_i)\) et \(p_{ij}(u_i)\) \(\simeq 0\)
Solution:
Scaling: \[ \begin{cases} a_i(u_i)=C_i \centerdot a_i^* (u_i) \\ g_i(y_i|u_i)=E_i \centerdot g_i^*(y_i|u_i) \\ p_{ij}(u_i) = D_{ij} \centerdot p_{ij}^*(u_i) \end{cases} \]




Pour un tableau de génotypes connus \(G\) \((I,L)\)
Simuler \(P(Y = \{\textcolor{Emerald}{D},\textcolor{Apricot}{R}\}|G)\):
Pour chaque \(i,l\) avec \(l \in L\) loci et \(i \in I\) individus:
| \(G_{i,l}\) | \(0/0\) | \(0/1\) | \(1/1\) |
|---|---|---|---|
| \(f(G_{i,l},\textcolor{BrickRed}{\epsilon})\) | \(\textcolor{BrickRed}{\epsilon}\) | \(1\over2\) | \(1 - \textcolor{BrickRed}{\epsilon}\) |
Donnée low-pass + array

Tester des parents candidats par rapport à un parent imputé

from scipy import stats
import numpy as np
random_state = 43
N, L, Dm, epsilon = 10, 1000, 5, 0.01
G = np.random.choice((0,1,2), size = (L,N), replace=True)
D = stats.poisson.rvs(Dm,size = (L,N), random_state = random_state)
pf = np.array((epsilon,0.5,1-epsilon))
pf_G = pf[G.ravel()].reshape((L,N))
R1 = stats.binom.rvs(D,pf_G, random_state = random_state)
penetrance = np.stack([
R1 * np.log(epsilon) + (D-R1) * np.log(1-epsilon),
D*np.full((L,N),np.log(0.5)),
R1*np.log(1-epsilon) + (D-R1) * np.log(epsilon)
],axis=-1)
penetrance = np.einsum("lnu->lun",penetrance)
Pied de page