首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Testing for deviations from Hardy–Weinberg equilibrium (HWE) is a common practice for quality control in genetic studies. Variable sites violating HWE may be identified as technical errors in the sequencing or genotyping process, or they may be of particular evolutionary interest. Large‐scale genetic studies based on next‐generation sequencing (NGS) methods have become more prevalent as cost is decreasing but these methods are still associated with statistical uncertainty. The large‐scale studies usually consist of samples from diverse ancestries that make the existence of some degree of population structure almost inevitable. Precautions are therefore needed when analysing these data set, as population structure causes deviations from HWE. Here we propose a method that takes population structure into account in the testing for HWE, such that other factors causing deviations from HWE can be detected. We show the effectiveness of PCAngsd in low‐depth NGS data, as well as in genotype data, for both simulated and real data set, where the use of genotype likelihoods enables us to model the uncertainty.  相似文献   

2.
Efficiency and robustness of pedigree segregation analysis.   总被引:18,自引:13,他引:5  
Different pedigree structures and likelihoods are examined to determine their efficiency for parameter estimation under one-locus models. For the cases simulated, family size has little effect; estimates based on unconditional likelihoods are generally more efficient than those based on conditional likelihoods. The proposed method of pedigree analysis under a one-locus model is found to be robust in the analysis of nuclear families: skewness of the data and polygenic inheritance will not lead to the spurious detection of major loci unless they occur simultaneously, and together with a moderate amount of environmental correlation among sibs.  相似文献   

3.
Centromeric-mapping methods have been used to investigate the association between altered recombination and meiotic nondisjunction in humans. For trisomies, current methods are based on the genotypes from a trisomic offspring and both parents. Because it is sometimes difficult to obtain samples from both parents and because the ability to use sources of DNA previously not available (e.g., stored paraffin-embedded pathological samples) has increased, we have been interested in creating similar maps for trisomic populations in which one of the parents of the trisomic individual is unavailable for genotyping. In this paper, we derive multipoint likelihoods for both missing-parent data and conventional two-parent data. We find that likelihoods for two-parent data and for data generated without a sample from the correctly disjoining parent can be maximized in exactly the same way but also that missing-parent data has a high frequency of partial data of the same sort produced by intercross matings. Previously published centromeric-mapping methods use incorrect likelihoods for intercross matings and thus can perform poorly on missing-parent data. We wrote a FORTRAN program to maximize our multipoint likelihoods and used it in simulation studies to demonstrate the biases in the previous methods.  相似文献   

4.
D Gianola  R L Fernando  S Im  J L Foulley 《Génome》1989,31(2):768-777
Conceptual aspects of estimation of genetic components of variance and covariance under selection are discussed, with special attention to likelihood methods. Certain selection processes are described and alternative likelihoods that can be used for analysis are specified. There is a mathematical relationship between the likelihoods that permits comparing the relative amount of information contained in them. Theoretical arguments and evidence indicate that point inferences made from likelihood functions are not affected by some forms of selection.  相似文献   

5.
We describe a new multivariate gamma distribution and discuss its implication in a Poisson-correlated gamma-frailty model. This model is introduced to account for between-subjects correlation occurring in longitudinal count data. For likelihood-based inference involving distributions in which high-dimensional dependencies are present, it may be useful to approximate likelihoods based on the univariate or bivariate marginal distributions. The merit of composite likelihood is to reduce the computational complexity of the full likelihood. A 2-stage composite-likelihood procedure is developed for estimating the model parameters. The suggested method is applied to a meta-analysis study for survival curves.  相似文献   

6.
The program which is written in FORTRAN estimates haplotype frequencies in two-locus and three-locus genetic systems from population diploid data. It is based on the gene counting method which leads to maximum likelihood estimates, and can be used whenever the possible antigens (one or more) on each chromosome can be specified for each person and for each locus, i.e., ABO-like systems and inclusions are permitted. The number of alleles per locus may be rather large, and both grouped and ungrouped data can be used. Log likelihoods are calculated on the basis of various assumptions, so that likelihood ratio tests can be carried out.  相似文献   

7.
I here consider the question of when to formulate a likelihood over the whole data set, as opposed to conditioning the likelihood on subsets of the data (i.e., joint vs. conditional likelihoods). I show that when certain conditions are met, these two likelihoods are guaranteed to be equivalent, and thus that it is generally preferable to condition on subsets, since that likelihood is mathematically and computationally simpler. However, I show that when these conditions are not met, conditioning on subsets of the data is equivalent to introducing additional df into our genetic model, df that we may not have been aware of. I discuss the implications of these facts for ascertainment corrections and other genetic problems.  相似文献   

8.
Understanding and characterising biochemical processes inside single cells requires experimental platforms that allow one to perturb and observe the dynamics of such processes as well as computational methods to build and parameterise models from the collected data. Recent progress with experimental platforms and optogenetics has made it possible to expose each cell in an experiment to an individualised input and automatically record cellular responses over days with fine time resolution. However, methods to infer parameters of stochastic kinetic models from single-cell longitudinal data have generally been developed under the assumption that experimental data is sparse and that responses of cells to at most a few different input perturbations can be observed. Here, we investigate and compare different approaches for calculating parameter likelihoods of single-cell longitudinal data based on approximations of the chemical master equation (CME) with a particular focus on coupling the linear noise approximation (LNA) or moment closure methods to a Kalman filter. We show that, as long as cells are measured sufficiently frequently, coupling the LNA to a Kalman filter allows one to accurately approximate likelihoods and to infer model parameters from data even in cases where the LNA provides poor approximations of the CME. Furthermore, the computational cost of filtering-based iterative likelihood evaluation scales advantageously in the number of measurement times and different input perturbations and is thus ideally suited for data obtained from modern experimental platforms. To demonstrate the practical usefulness of these results, we perform an experiment in which single cells, equipped with an optogenetic gene expression system, are exposed to various different light-input sequences and measured at several hundred time points and use parameter inference based on iterative likelihood evaluation to parameterise a stochastic model of the system.  相似文献   

9.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (DLR) appeared to be an effective way to predict whether F0 immigrants could be identified for a particular pair of populations using a given set of markers.  相似文献   

10.
Longitudinal data usually consist of a number of short time series. A group of subjects or groups of subjects are followed over time and observations are often taken at unequally spaced time points, and may be at different times for different subjects. When the errors and random effects are Gaussian, the likelihood of these unbalanced linear mixed models can be directly calculated, and nonlinear optimization used to obtain maximum likelihood estimates of the fixed regression coefficients and parameters in the variance components. For binary longitudinal data, a two state, non-homogeneous continuous time Markov process approach is used to model serial correlation within subjects. Formulating the model as a continuous time Markov process allows the observations to be equally or unequally spaced. Fixed and time varying covariates can be included in the model, and the continuous time model allows the estimation of the odds ratio for an exposure variable based on the steady state distribution. Exact likelihoods can be calculated. The initial probability distribution on the first observation on each subject is estimated using logistic regression that can involve covariates, and this estimation is embedded in the overall estimation. These models are applied to an intervention study designed to reduce children's sun exposure.  相似文献   

11.
Few food web theory hypotheses/predictions can be readily tested using likelihoods of reproducing the data. Simple probabilistic models for food web structure, however, are an exception as their likelihoods were recently derived. Here I test the performance of a more complex model for food web structure that is grounded in the allometric scaling of interactions with body size and the theory of optimal foraging (Allometric Diet Breadth Model—ADBM). This deterministic model has been evaluated by measuring the fraction of trophic relations it correctly predicts. I contrasted this value with that produced by simpler models based on body sizes and found that the quantitative information on allometric scaling and optimal foraging does not significantly increase model fit. Also, I present a method to compute the p-value for the fraction of trophic interactions correctly predicted by the ADBM, or any other model, with respect to three probabilistic models. I find that the ADBM predicts significantly more links than random graphs, but other models can outperform it. Although optimal foraging and allometric scaling may improve our understanding of food webs, the ADBM needs to be modified or replaced to find support in the data.  相似文献   

12.
Gene mapping and genetic epidemiology require large-scale computation of likelihoods based on human pedigree data. Although computation of such likelihoods has become increasingly sophisticated, fast calculations are still impeded by complex pedigree structures, by models with many underlying loci and by missing observations on key family members. The current paper 'introduces' a new method of array factorization that substantially accelerates linkage calculations with large numbers of markers. This method is not limited to nuclear families or to families with complete phenotyping. Vectorization and parallelization are two general-purpose hardware techniques for accelerating computations. These techniques can assist in the rapid calculation of genetic likelihoods. We describe our experience using both of these methods with the existing program MENDEL. A vectorized version of MENDEL was run on an IBM 3090 supercomputer. A parallelized version of MENDEL was run on parallel machines of different architectures and on a network of workstations. Applying these revised versions of MENDEL to two challenging linkage problems yields substantial improvements in computational speed.  相似文献   

13.
Feed forward loops (FFLs) are gene regulatory network motifs. They exist in different types, defined by the signs of the effects of genes in the motif on one another. We examine 36 feed forward loops in Escherichia coli, using evolutionary simulations to predict the forms of FFL expected to evolve to generate the pattern of expression of the output gene. These predictions are tested using likelihood ratios, comparing likelihoods of the observed FFL structures with their likelihoods under null models. The very high likelihood ratios generated, of over 10(11), suggest that evolutionary simulation is a valuable component in the explanation of FFL structure.  相似文献   

14.
The distribution of epimastigote forms of Trypanosoma cruzi in the microcirculatory network and the vessel alterations were observed using an intravital microscopy technique. Immediately after intravenous inoculation of 2 x 10(6) epimastigote suspension into normal mice, parasites were seen as circulating clumps, and their retention at some sites of the endothelium of venules and capillaries was observed. Injection of 2 x 10(7) and 2 x 10(8) parasite suspensions induced, respectively, intermittent or total stasis of venules and capillaries, probably via obstruction by clumping. The mobility of epimastigotes in the clumps indicates that parasites were alive in the lumen of vessels. The retention of clumps in the capillaries, although intense, could only be observed when labeled parasites were inoculated. These results suggest that the rapid clearance of epimastigote forms of T. cruzi from the blood circulation of mice may be due to the retention of parasites to the endothelium of venules and capillaries that, in turn, may facilitate phagocytosis. This may be a mechanism by which mice are able to eliminate epimastigote forms from the circulation. These findings are consistent with our previous observations showing that epimastigotes are not lysed by complement activation but are phagocytosed and destroyed by a distinct population of blood cells.  相似文献   

15.
For a particular chemical, one can treat the chemical-by-chemical variation in relative doses for equal toxicity in experimental animals and humans as a characterization of the likelihoods of extrapolation factors of different magnitudes. An emerging approach to noncancer risk assessment is to use such empirical distributions in place of fixed Uncertainty Factors. This paper discusses dividing the overall variation into two sub-distributions representing pharmacokinetic (PK) and pharmacodynamic (PD) contributions to the variation among chemicals in the animal-to-human toxicologically equivalent dose. If a physiologically based pharmacokinetic model (PBPK model) is used to derive a compound specific adjustment factor (CSAF) for the pharmacokinetic component, the deconvolution of the PK and PD components allows one to remove the PK component (to be replaced with the CSAF), while retaining the uncertainty in pharmacodynamics that PBPK models do not address. One must then add back the uncertainty in the PBPK determination of the CSAF (which may be considerable). A candidate criterion for whether one can use an uncertain PBPK model is whether the generic uncertainty about cross-species pharmacokinetics (reflected in the PK component of the overall empirical distribution) is larger than the chemical-specific uncertainty in the determination of kinetically equivalent doses in experimental animals and humans.  相似文献   

16.
In an affected-sib-pair study, the parents are often unavailable for typing, particularly for diseases of late onset. In many cases, however, it is possible to sample unaffected siblings. It is therefore desirable to assess the contribution of such siblings to the power of such a study. The likelihood ratio introduced by Risch and improved by Holmans was extended to incorporate data from unaffected siblings. Tests based on two likelihoods were considered: the full likelihood of the data, based on the identity-by-descent (IBD) sharing states of the entire sibship, and a pseudolikelihood based on the IBD sharing states of the affected pair only, using the unaffected siblings to infer parental genotypes. The latter approach was found to be more powerful, except when penetrance was high. Typing an unaffected sibling, or just one parent, was found to give only a small increase in power except when the PIC of the marker was low. Even then, typing an unaffected relative increased the overall number of individuals that had to be typed to achieve a given power. If there is no highly informative marker locus in the area under study, it may be possible to "build" one by combining the alleles from two or more neighboring tightly linked loci into haplotypes. Typing two loci gave a sizeable power increase over a single locus, but typing further loci gave much smaller gains. Building haplotypes will introduce phase uncertainties, with the result that such a system will yield less power than will a single locus with the same number of alleles. This power loss was small, however, and did not affect the conclusions regarding the worth of typing unaffected relatives.  相似文献   

17.
Several programs are currently available for the detection of genotyping error that may or may not be Mendelianly inconsistent. However, no systematic study exists that evaluates their performance under varying pedigree structures and sizes, marker spacing, and allele frequencies. Our simulation study compares four multipoint methods: Merlin, Mendel4, SimWalk2, and Sibmed. We look at empirical thresholds, power, and false-positive rates on 7 small pedigree structures that included sibships with and without genotyped parents, and a three-generation pedigree, using 11 microsatellite markers with 3 different map spacings. Simulated data includes 5,000 replicates of each pedigree structure and marker map, with random genotyping errors in about 4% of the middle marker's genotypes. We found that the default thresholds used by these programs provide low power (47-72%). Power is improved more by adding genotyped siblings than by using more closely spaced markers. Some mistyping methods are sensitive to the frequencies of the observed alleles. Siblings of mistyped individuals have elevated false-positive rates, as do markers close to the mistyped marker. We conclude that thresholds should be decided based on the pedigree and marker data and that greater focus should be placed on modeling genotyping error when computing likelihoods, rather than on detecting and eliminating genotyping errors.  相似文献   

18.
The interaction between S-layer protein SbsB and the secondary cell wall polymer (SCWP) of Geobacillus stearothermophilus PV72/p2 was investigated by real-time surface plasmon resonance biosensor technology. The SCWP is an acidic polysaccharide that contains N-acetylglucosamine, N-acetylmannosamine, and pyruvic acid. For interaction studies, recombinant SbsB (rSbsB) and two truncated forms consisting of either the S-layer-like homology (SLH) domain (3SLH) or the residual part of SbsB were used. Independent of the setup, the data showed that the SLH domain was exclusively responsible for SCWP binding. The interaction was found to be highly specific, since neither the peptidoglycan nor SCWPs from other organisms nor other polysaccharides were recognized. Data analysis from that setup in which 3SLH was immobilized on a sensor chip and SCWP represented the soluble analyte was done in accordance with a model that describes binding of a bivalent analyte to a fixed ligand in terms of an overall affinity for all binding sites. The measured data revealed the presence of at least two binding sites on a single SCWP molecule with a distance of about 14 nm and an overall Kd of 7.7 x 10(-7) M. Analysis of data from the inverted setup in which the SCWP was immobilized on a sensor chip was done in accordance with an extension of the heterogeneous-ligand model, which indicated the existence of three binding sites with low (Kd = 2.6 x 10(-5) M), medium (Kd = 6.1 x 10(-8) M), and high (Kd = 6.7 x 10(-11) M) affinities. Since in this setup 3SLH was the soluble analyte and the presence of small amounts of oligomers in even monomeric protein solutions cannot be excluded, the high-affinity binding site may result from avidity effects caused by binding of at least dimeric 3SLH. Solution competition assays performed with both setups confirmed the specificity of the protein-carbohydrate interaction investigated.  相似文献   

19.
It is now generally accepted that the human visual system consists of subsystems (channels) that may be activated in parallel. According to some models of detection, detection is by probability summation among channels, while in other models it is assumed that detection is by a single channel that may even be tuned specifically to the stimulus pattern (detection by a matched filter). So far, arguments in particular for the hypothesis of probbbility summation are based on plausibility considerations and on demonstrations that the data from certain detection experiments are compatible with this hypothesis. In this paper it is shown that linear contrast interrelationship functions together with a property of a large class of distribution functions (strict log-concavity or logconvexity on the relevant set of contrasts/intensities) uniquely point to detection by a single channel. In particular, models of detection by probability summation based on Quick's Model are incompatible with linear contrast interrelationship functions. Sufficient (and observable) conditions for the strict logconcavity/log-convexity of distribution functions are presented.  相似文献   

20.
We present an alternative method for calculating likelihoods in molecular phylogenetics. Our method is based on partial likelihood tensors, which are generalizations of partial likelihood vectors, as used in Felsenstein's approach. Exploiting a lexicographic sorting and partial likelihood tensors, it is possible to obtain significant computational savings. We show this on a range of simulated data by enumerating all numerical calculations that are required by our method and the standard approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号