共查询到20条相似文献,搜索用时 15 毫秒
1.
jModelTest: phylogenetic model averaging 总被引:15,自引:0,他引:15
Posada D 《Molecular biology and evolution》2008,25(7):1253-1256
jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 52:696-704.). It implements 5 different selection strategies, including "hierarchical and dynamical likelihood ratio tests," the "Akaike information criterion," the "Bayesian information criterion," and a "decision-theoretic performance-based" approach. This program also calculates the relative importance and model-averaged estimates of substitution parameters, including a model-averaged estimate of the phylogeny. jModelTest is written in Java and runs under Mac OSX, Windows, and Unix systems with a Java Runtime Environment installed. The program, including documentation, can be freely downloaded from the software section at http://darwin.uvigo.es. 相似文献
2.
We present two programs: gafs for optimal selection of loci for use in individual assignment tests, and mlc , a program for individual classification using maximum likelihood and k‐nearest neighbour decision rules. gafs software employs a genetic algorithm to heuristically search multilocus subsets with several objective functions to maximize predictive accuracy of the assignments. 相似文献
3.
Wooding S 《Bioinformatics (Oxford, England)》2003,19(4):539-540
SUMMARY: PopHist is a computer program that uses the frequency spectrum of alleles to: (a) estimate maximum likelihood parameters describing a population's history; and (b) compare alternative hypotheses about population history using likelihood ratio tests. The program uses the matrix coalescent, a method for calculating theoretical frequency spectra that can be applied to sets of unlinked sites. AVAILABILITY: Source code and documentation are available at http://mombasa.anthro.utah.edu/wooding/PopHist 相似文献
4.
S O Larsen 《Computer programs in biomedicine》1979,9(3):213-217
Phenotypes in an ABO-like system of a number of genetically-independent persons from a number of populations are supposed to be observed. The program which is written in FORTRAN calculates maximum likelihood estimates of gene frequencies and their standard errors in each population and in the populations taken together. Furthermore the program calculates expected values and likelihood ratio and goodness of fit chi-square tests of Hardy-Weinberg equilibrium. If several subpopulations are pooled together a likelihood ratio test of homogeneity is performed. 相似文献
5.
Unequally spaced longitudinal data with AR(1) serial correlation 总被引:3,自引:0,他引:3
This paper discusses longitudinal data analysis when each subject is observed at different unequally spaced time points. Observations within subjects are assumed to be either uncorrelated or to have a continuous-time first-order autoregressive structure, possibly with observation error. The random coefficients are assumed to have an arbitrary between-subject covariance matrix. Covariates can be included in the fixed effects part of the model. Exact maximum likelihood estimates of the unknown parameters are computed using the Kalman filter to evaluate the likelihood, which is then maximized with a nonlinear optimization program. An example is presented where a large number of subjects are each observed at a small number of observation times. Hypothesis tests for selecting the best model are carried out using Wald's test on contrasts or likelihood ratio tests based on fitting full and restricted models. 相似文献
6.
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests. 相似文献
7.
Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level 总被引:21,自引:0,他引:21
Detecting positive Darwinian selection at the DNA sequence level has been a subject of considerable interest. However, positive selection is difficult to detect because it often operates episodically on a few amino acid sites, and the signal may be masked by negative selection. Several methods have been developed to test positive selection that acts on given branches (branch methods) or on a subset of sites (site methods). Recently, Yang, Z., and R. Nielsen (2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19:908-917) developed likelihood ratio tests (LRTs) based on branch-site models to detect positive selection that affects a small number of sites along prespecified lineages. However, computer simulations suggested that the tests were sensitive to the model assumptions and were unable to distinguish between relaxation of selective constraint and positive selection (Zhang, J. 2004. Frequent false detection of positive selection by the likelihood method with branch-site models. Mol. Biol. Evol. 21:1332-1339). Here, we describe a modified branch-site model and use it to construct two LRTs, called branch-site tests 1 and 2. We applied the new tests to reanalyze several real data sets and used computer simulation to examine the performance of the two tests by examining their false-positive rate, power, and robustness. We found that test 1 was unable to distinguish relaxed constraint from positive selection affecting the lineages of interest, while test 2 had acceptable false-positive rates and appeared robust against violations of model assumptions. As test 2 is a direct test of positive selection on the lineages of interest, it is referred to as the branch-site test of positive selection and is recommended for use in real data analysis. The test appeared conservative overall, but exhibited better power in detecting positive selection than the branch-based test. Bayes empirical Bayes identification of amino acid sites under positive selection along the foreground branches was found to be reliable, but lacked power. 相似文献
8.
Selecting the best-fit model of nucleotide substitution 总被引:2,自引:0,他引:2
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution. 相似文献
9.
《International journal of bio-medical computing》1989,24(3):153-189
We describe a mathematical technique and an associated computer program for comparing, evaluating and optimizing diagnostic tests. The technique combines receiver operating characteristic (ROC) analysis with information theory and cost-benefit analysis to accomplish this. The program is menu driven and highly interactive; it generates 13 possible user-determined ASCII disk files which can be easily converted to graphs. These graphs allow the user to make detailed comparisons among various diagnostic tests for all values of disorder prevalence, and also provide guidelines for cut-off selection in order to optimize tests. These techniques are applied to three published studies of the enzyme screening assay for diagnosis of infection with the HIV virus. We show how graphs produced by this program can be used to compare and optimize these diagnostic tests. The program is written for an IBM-compatible microcomputer running on a DOS operating system. 相似文献
10.
Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution 总被引:35,自引:16,他引:19
An excess of nonsynonymous substitutions over synonymous ones is an
important indicator of positive selection at the molecular level. A lineage
that underwent Darwinian selection may have a nonsynonymous/synonymous rate
ratio (dN/dS) that is different from those of other lineages or greater
than one. In this paper, several codon-based likelihood models that allow
for variable dN/dS ratios among lineages were developed. They were then
used to construct likelihood ratio tests to examine whether the dN/dS ratio
is variable among evolutionary lineages, whether the ratio for a few
lineages of interest is different from the background ratio for other
lineages in the phylogeny, and whether the dN/dS ratio for the lineages of
interest is greater than one. The tests were applied to the lysozyme genes
of 24 primate species. The dN/dS ratios were found to differ significantly
among lineages, indicating that the evolution of primate lysozymes is
episodic, which is incompatible with the neutral theory. Maximum-
likelihood estimates of parameters suggested that about nine nonsynonymous
and zero synonymous nucleotide substitutions occurred in the lineage
leading to hominoids, and the dN/dS ratio for that lineage is significantly
greater than one. The corresponding estimates for the lineage ancestral to
colobine monkeys were nine and one, and the dN/dS ratio for the lineage is
not significantly greater than one, although it is significantly higher
than the background ratio. The likelihood analysis thus confirmed most, but
not all, conclusions Messier and Stewart reached using reconstructed
ancestral sequences to estimate synonymous and nonsynonymous rates for
different lineages.
相似文献
11.
To aid physicians who may be having difficulty applying the principles of decision analysis to diagnostic data according to the methods published in the past several years, the authors of this paper set out a few principles and schemes for using and interpreting diagnostic data obtained from dichotomous tests. They also present a simple BASIC program for calculating post-test probabilities from likelihood ratios and pretest probabilities that a particular disease is present in a particular patient; the program can be adapted for use on microcomputers. 相似文献
12.
Pie MR 《Molecular biology and evolution》2006,23(12):2274-2278
The power of maximum likelihood tests of positive selection on protein-coding genes depends heavily on detecting and accounting for potential biases in the studied data set. Although the influence of transition:transversion and codon biases have been investigated in detail, little is known about how inaccuracy in the phylogeny used during the calculations affects the performance of these tests. In this study, 3 empirical data sets are analyzed using sets of simulated topologies corresponding to low, intermediate, and high levels of phylogenetic uncertainty. The detection of positive selection was largely unaffected by errors in the underlying phylogeny. However, the number of sites identified as being under positive selection tended to be overestimated. 相似文献
13.
Stich B Melchinger AE Piepho HP Hamrit S Schipprack W Maurer HP Reif JC 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2007,115(4):529-536
Knowledge about the forces generating and conserving linkage disequilibrium (LD) is important for drawing conclusions about
the prospects and limitations of association mapping. The objectives of our research were to examine the importance of (1)
selection, (2) mutation, and (3) genetic drift for generating LD in a typical maize breeding program. We conducted computer
simulations based on genotypic data of Central European maize open-pollinated varieties which have played an important role
as founders of the European flint heterotic group. The breeding scheme and the dimensioning underlying our simulations reflect
essentially the maize breeding program of the University of Hohenheim. Results suggested that in a plant breeding program
of the examined dimension and breeding scheme, genetic drift and selection are major forces generating LD. The currently used
population-based association mapping tests do not explicitly correct for LD caused by these two forces. Therefore, increased
type I error rates are expected if these tests are applied to plant breeding populations. As a consequence, we recommend to
use family-based association tests for association mapping approaches in plant breeding populations. 相似文献
14.
Genes involved in host-pathogen interactions are often strongly affected by positive natural selection. The Duffy antigen, coded by the Duffy antigen receptor for chemokines (DARC) gene, serves as a receptor for Plasmodium vivax in humans and for Plasmodium knowlesi in some nonhuman primates. In the majority of sub-Saharan Africans, a nucleic acid variant in GATA-1 of the gene promoter is responsible for the nonexpression of the Duffy antigen on red blood cells and consequently resistance to invasion by P. vivax. The Duffy antigen also acts as a receptor for chemokines and is expressed in red blood cells and many other tissues of the body. Because of this dual role, we sequenced a ~3,000-bp region encompassing the entire DARC gene as well as part of its 5' and 3' flanking regions in a phylogenetic sample of primates and used statistical methods to evaluate the nature of selection pressures acting on the gene during its evolution. We analyzed both coding and regulatory regions of the DARC gene. The regulatory analysis showed accelerated rates of substitution at several sites near known motifs. Our tests of positive selection in the coding region using maximum likelihood by branch sites and maximum likelihood by codon sites did not yield statistically significant evidence for the action of positive selection. However, the maximum likelihood test in which the gene was subdivided into different structural regions showed that the known binding region for P. vivax/P. knowlesi is under very different selective pressures than the remainder of the gene. In fact, most of the gene appears to be under strong purifying selection, but this is not evident in the binding region. We suggest that the binding region is under the influence of two opposing selective pressures, positive selection possibly exerted by the parasite and purifying selection exerted by chemokines. 相似文献
15.
In this article we describe the construction of a general computer program for the iterative calculation of maximum likelihood estimators. The program is general in the sense that it allows the maximization of any given likelihood function. The user only has to write a subroutine LKLHD, in which the special likelihood function and their first and second derivatives will be calculated. This subroutine is an input parameter of the optimization program. This enables the user to employ one main program for the maximization of various likelihood functions. This advantage will be shown for the evaluation of qualitative dose response relationships (quantal assays: probit-, logit-analysis). 相似文献
16.
The evolutionary analysis of OrcPI, the orchid homologue to the PISTILLATA/GLOBOSA gene, was conducted on some Mediterranean orchid species, measuring mean pairwise Ka/Ks ratios and nucleotide variability. Evidence for positive selection was tested using a maximum likelihood approach implemented in PAML, and neutrality tests were conducted to assess deviation from neutral evolution. Data were also examined partitioning the coding region into four regions, corresponding to different functional domains of the protein. The results show that OrcPI is subjected to different evolutionary forces: diffuse purifying selection, localized positive selection or selective sweep, and different partitions of selective constraints. 相似文献
17.
S O Larsen 《Computer programs in biomedicine》1979,10(1):48-54
The program which is written in FORTRAN estimates haplotype frequencies in two-locus and three-locus genetic systems from population diploid data. It is based on the gene counting method which leads to maximum likelihood estimates, and can be used whenever the possible antigens (one or more) on each chromosome can be specified for each person and for each locus, i.e., ABO-like systems and inclusions are permitted. The number of alleles per locus may be rather large, and both grouped and ungrouped data can be used. Log likelihoods are calculated on the basis of various assumptions, so that likelihood ratio tests can be carried out. 相似文献
18.
COANCESTRY: a program for simulating, estimating and analysing relatedness and inbreeding coefficients 总被引:2,自引:0,他引:2
Wang J 《Molecular ecology resources》2011,11(1):141-145
The software package COANCESTRY implements seven relatedness estimators and three inbreeding estimators to estimate relatedness and inbreeding coefficients from multilocus genotype data. Two likelihood estimators that allow for inbred individuals and account for genotyping errors are for the first time included in this user-friendly program for PCs running Windows operating system. A simulation module is built in the program to simulate multilocus genotype data of individuals with a predefined relationship, and to compare the estimators and the simulated relatedness values to facilitate the selection of the best estimator in a particular situation. Bootstrapping and permutations are used to obtain the 95% confidence intervals of each relatedness or inbreeding estimate, and to test the difference in averages between groups. 相似文献
19.
The need to consider in capture-recapture models random effects besides fixed effects such as those of environmental covariates has been widely recognized over the last years. However, formal approaches require involved likelihood integrations, and conceptual and technical difficulties have slowed down the spread of capture-recapture mixed models among biologists. In this article, we evaluate simple procedures to test for the effect of an environmental covariate on parameters such as time-varying survival probabilities in presence of a random effect corresponding to unexplained environmental variation. We show that the usual likelihood ratio test between fixed models is strongly biased, and tends to detect too often a covariate effect. Permutation and analysis of deviance tests are shown to behave properly and are recommended. Permutation tests are implemented in the latest version of program E-SURGE. Our approach also applies to generalized linear mixed models. 相似文献
20.
Multifactorial analysis of family data ascertained through truncation: a comparative evaluation of two methods of statistical inference. 总被引:4,自引:3,他引:1
下载免费PDF全文
![点击此处可从《American journal of human genetics》网站下载免费的PDF全文](/ch/ext_images/free.gif)
When family data are ascertained through single selection based on truncation, a prevailing method of analysis is to condition the likelihood function on the proband's actual phenotypic value. An alternative method conditions the likelihood function on the event that the proband's measurement lies in the truncation region. Both methods are contrasted here by using Monte Carlo simulations; identical sets of data were analyzed using both methods. The results suggest that, under either method, (1) parameter estimates are nearly unbiased and (2) likelihood-ratio tests of null hypotheses are approximately distributed as chi 2. However, conditioning on the proband's actual phenotypic value yields considerably less efficient estimates and reduced power for hypothesis tests. A corresponding result also holds under complete ascertainment. It is argued, therefore, that whenever sufficient information is available on the nature of truncation, the alternative approach should be used. 相似文献