首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
jModelTest: phylogenetic model averaging   总被引:15,自引:0,他引:15  
jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 52:696-704.). It implements 5 different selection strategies, including "hierarchical and dynamical likelihood ratio tests," the "Akaike information criterion," the "Bayesian information criterion," and a "decision-theoretic performance-based" approach. This program also calculates the relative importance and model-averaged estimates of substitution parameters, including a model-averaged estimate of the phylogeny. jModelTest is written in Java and runs under Mac OSX, Windows, and Unix systems with a Java Runtime Environment installed. The program, including documentation, can be freely downloaded from the software section at http://darwin.uvigo.es.  相似文献   

2.
We present two programs: gafs for optimal selection of loci for use in individual assignment tests, and mlc , a program for individual classification using maximum likelihood and k‐nearest neighbour decision rules. gafs software employs a genetic algorithm to heuristically search multilocus subsets with several objective functions to maximize predictive accuracy of the assignments.  相似文献   

3.
SUMMARY: PopHist is a computer program that uses the frequency spectrum of alleles to: (a) estimate maximum likelihood parameters describing a population's history; and (b) compare alternative hypotheses about population history using likelihood ratio tests. The program uses the matrix coalescent, a method for calculating theoretical frequency spectra that can be applied to sets of unlinked sites. AVAILABILITY: Source code and documentation are available at http://mombasa.anthro.utah.edu/wooding/PopHist  相似文献   

4.
Phenotypes in an ABO-like system of a number of genetically-independent persons from a number of populations are supposed to be observed. The program which is written in FORTRAN calculates maximum likelihood estimates of gene frequencies and their standard errors in each population and in the populations taken together. Furthermore the program calculates expected values and likelihood ratio and goodness of fit chi-square tests of Hardy-Weinberg equilibrium. If several subpopulations are pooled together a likelihood ratio test of homogeneity is performed.  相似文献   

5.
Unequally spaced longitudinal data with AR(1) serial correlation   总被引:3,自引:0,他引:3  
This paper discusses longitudinal data analysis when each subject is observed at different unequally spaced time points. Observations within subjects are assumed to be either uncorrelated or to have a continuous-time first-order autoregressive structure, possibly with observation error. The random coefficients are assumed to have an arbitrary between-subject covariance matrix. Covariates can be included in the fixed effects part of the model. Exact maximum likelihood estimates of the unknown parameters are computed using the Kalman filter to evaluate the likelihood, which is then maximized with a nonlinear optimization program. An example is presented where a large number of subjects are each observed at a small number of observation times. Hypothesis tests for selecting the best model are carried out using Wald's test on contrasts or likelihood ratio tests based on fitting full and restricted models.  相似文献   

6.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

7.
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power.  相似文献   

8.
Selecting the best-fit model of nucleotide substitution   总被引:2,自引:0,他引:2  
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.  相似文献   

9.
We describe a mathematical technique and an associated computer program for comparing, evaluating and optimizing diagnostic tests. The technique combines receiver operating characteristic (ROC) analysis with information theory and cost-benefit analysis to accomplish this. The program is menu driven and highly interactive; it generates 13 possible user-determined ASCII disk files which can be easily converted to graphs. These graphs allow the user to make detailed comparisons among various diagnostic tests for all values of disorder prevalence, and also provide guidelines for cut-off selection in order to optimize tests. These techniques are applied to three published studies of the enzyme screening assay for diagnosis of infection with the HIV virus. We show how graphs produced by this program can be used to compare and optimize these diagnostic tests. The program is written for an IBM-compatible microcomputer running on a DOS operating system.  相似文献   

10.
An excess of nonsynonymous substitutions over synonymous ones is an important indicator of positive selection at the molecular level. A lineage that underwent Darwinian selection may have a nonsynonymous/synonymous rate ratio (dN/dS) that is different from those of other lineages or greater than one. In this paper, several codon-based likelihood models that allow for variable dN/dS ratios among lineages were developed. They were then used to construct likelihood ratio tests to examine whether the dN/dS ratio is variable among evolutionary lineages, whether the ratio for a few lineages of interest is different from the background ratio for other lineages in the phylogeny, and whether the dN/dS ratio for the lineages of interest is greater than one. The tests were applied to the lysozyme genes of 24 primate species. The dN/dS ratios were found to differ significantly among lineages, indicating that the evolution of primate lysozymes is episodic, which is incompatible with the neutral theory. Maximum- likelihood estimates of parameters suggested that about nine nonsynonymous and zero synonymous nucleotide substitutions occurred in the lineage leading to hominoids, and the dN/dS ratio for that lineage is significantly greater than one. The corresponding estimates for the lineage ancestral to colobine monkeys were nine and one, and the dN/dS ratio for the lineage is not significantly greater than one, although it is significantly higher than the background ratio. The likelihood analysis thus confirmed most, but not all, conclusions Messier and Stewart reached using reconstructed ancestral sequences to estimate synonymous and nonsynonymous rates for different lineages.   相似文献   

11.
To aid physicians who may be having difficulty applying the principles of decision analysis to diagnostic data according to the methods published in the past several years, the authors of this paper set out a few principles and schemes for using and interpreting diagnostic data obtained from dichotomous tests. They also present a simple BASIC program for calculating post-test probabilities from likelihood ratios and pretest probabilities that a particular disease is present in a particular patient; the program can be adapted for use on microcomputers.  相似文献   

12.
The power of maximum likelihood tests of positive selection on protein-coding genes depends heavily on detecting and accounting for potential biases in the studied data set. Although the influence of transition:transversion and codon biases have been investigated in detail, little is known about how inaccuracy in the phylogeny used during the calculations affects the performance of these tests. In this study, 3 empirical data sets are analyzed using sets of simulated topologies corresponding to low, intermediate, and high levels of phylogenetic uncertainty. The detection of positive selection was largely unaffected by errors in the underlying phylogeny. However, the number of sites identified as being under positive selection tended to be overestimated.  相似文献   

13.
Knowledge about the forces generating and conserving linkage disequilibrium (LD) is important for drawing conclusions about the prospects and limitations of association mapping. The objectives of our research were to examine the importance of (1) selection, (2) mutation, and (3) genetic drift for generating LD in a typical maize breeding program. We conducted computer simulations based on genotypic data of Central European maize open-pollinated varieties which have played an important role as founders of the European flint heterotic group. The breeding scheme and the dimensioning underlying our simulations reflect essentially the maize breeding program of the University of Hohenheim. Results suggested that in a plant breeding program of the examined dimension and breeding scheme, genetic drift and selection are major forces generating LD. The currently used population-based association mapping tests do not explicitly correct for LD caused by these two forces. Therefore, increased type I error rates are expected if these tests are applied to plant breeding populations. As a consequence, we recommend to use family-based association tests for association mapping approaches in plant breeding populations.  相似文献   

14.
Genes involved in host-pathogen interactions are often strongly affected by positive natural selection. The Duffy antigen, coded by the Duffy antigen receptor for chemokines (DARC) gene, serves as a receptor for Plasmodium vivax in humans and for Plasmodium knowlesi in some nonhuman primates. In the majority of sub-Saharan Africans, a nucleic acid variant in GATA-1 of the gene promoter is responsible for the nonexpression of the Duffy antigen on red blood cells and consequently resistance to invasion by P. vivax. The Duffy antigen also acts as a receptor for chemokines and is expressed in red blood cells and many other tissues of the body. Because of this dual role, we sequenced a ~3,000-bp region encompassing the entire DARC gene as well as part of its 5' and 3' flanking regions in a phylogenetic sample of primates and used statistical methods to evaluate the nature of selection pressures acting on the gene during its evolution. We analyzed both coding and regulatory regions of the DARC gene. The regulatory analysis showed accelerated rates of substitution at several sites near known motifs. Our tests of positive selection in the coding region using maximum likelihood by branch sites and maximum likelihood by codon sites did not yield statistically significant evidence for the action of positive selection. However, the maximum likelihood test in which the gene was subdivided into different structural regions showed that the known binding region for P. vivax/P. knowlesi is under very different selective pressures than the remainder of the gene. In fact, most of the gene appears to be under strong purifying selection, but this is not evident in the binding region. We suggest that the binding region is under the influence of two opposing selective pressures, positive selection possibly exerted by the parasite and purifying selection exerted by chemokines.  相似文献   

15.
In this article we describe the construction of a general computer program for the iterative calculation of maximum likelihood estimators. The program is general in the sense that it allows the maximization of any given likelihood function. The user only has to write a subroutine LKLHD, in which the special likelihood function and their first and second derivatives will be calculated. This subroutine is an input parameter of the optimization program. This enables the user to employ one main program for the maximization of various likelihood functions. This advantage will be shown for the evaluation of qualitative dose response relationships (quantal assays: probit-, logit-analysis).  相似文献   

16.
Aceto S  Montieri S  Sica M  Gaudio L 《Gene》2007,392(1-2):299-305
The evolutionary analysis of OrcPI, the orchid homologue to the PISTILLATA/GLOBOSA gene, was conducted on some Mediterranean orchid species, measuring mean pairwise Ka/Ks ratios and nucleotide variability. Evidence for positive selection was tested using a maximum likelihood approach implemented in PAML, and neutrality tests were conducted to assess deviation from neutral evolution. Data were also examined partitioning the coding region into four regions, corresponding to different functional domains of the protein. The results show that OrcPI is subjected to different evolutionary forces: diffuse purifying selection, localized positive selection or selective sweep, and different partitions of selective constraints.  相似文献   

17.
The program which is written in FORTRAN estimates haplotype frequencies in two-locus and three-locus genetic systems from population diploid data. It is based on the gene counting method which leads to maximum likelihood estimates, and can be used whenever the possible antigens (one or more) on each chromosome can be specified for each person and for each locus, i.e., ABO-like systems and inclusions are permitted. The number of alleles per locus may be rather large, and both grouped and ungrouped data can be used. Log likelihoods are calculated on the basis of various assumptions, so that likelihood ratio tests can be carried out.  相似文献   

18.
The software package COANCESTRY implements seven relatedness estimators and three inbreeding estimators to estimate relatedness and inbreeding coefficients from multilocus genotype data. Two likelihood estimators that allow for inbred individuals and account for genotyping errors are for the first time included in this user-friendly program for PCs running Windows operating system. A simulation module is built in the program to simulate multilocus genotype data of individuals with a predefined relationship, and to compare the estimators and the simulated relatedness values to facilitate the selection of the best estimator in a particular situation. Bootstrapping and permutations are used to obtain the 95% confidence intervals of each relatedness or inbreeding estimate, and to test the difference in averages between groups.  相似文献   

19.
The need to consider in capture-recapture models random effects besides fixed effects such as those of environmental covariates has been widely recognized over the last years. However, formal approaches require involved likelihood integrations, and conceptual and technical difficulties have slowed down the spread of capture-recapture mixed models among biologists. In this article, we evaluate simple procedures to test for the effect of an environmental covariate on parameters such as time-varying survival probabilities in presence of a random effect corresponding to unexplained environmental variation. We show that the usual likelihood ratio test between fixed models is strongly biased, and tends to detect too often a covariate effect. Permutation and analysis of deviance tests are shown to behave properly and are recommended. Permutation tests are implemented in the latest version of program E-SURGE. Our approach also applies to generalized linear mixed models.  相似文献   

20.
When family data are ascertained through single selection based on truncation, a prevailing method of analysis is to condition the likelihood function on the proband's actual phenotypic value. An alternative method conditions the likelihood function on the event that the proband's measurement lies in the truncation region. Both methods are contrasted here by using Monte Carlo simulations; identical sets of data were analyzed using both methods. The results suggest that, under either method, (1) parameter estimates are nearly unbiased and (2) likelihood-ratio tests of null hypotheses are approximately distributed as chi 2. However, conditioning on the proband's actual phenotypic value yields considerably less efficient estimates and reduced power for hypothesis tests. A corresponding result also holds under complete ascertainment. It is argued, therefore, that whenever sufficient information is available on the nature of truncation, the alternative approach should be used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号