首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Estimating species trees using multiple-allele DNA sequence data   总被引:3,自引:0,他引:3  
Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multiple-allele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets--yeast (Saccharomyces) and birds (Manacus-manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics.  相似文献   

2.
Summary We have compared the amino acid sequences of cytochromec's from 45 species of organisms representing all five kingdoms, including one species each for the Protista and Monera. We have made a phylogeny for these data by reconstructing probable ancestral sequences which generate the present descendants through a minimum number of mutations. Several trials with different data sets produced the same minimal configuration. Assuming the occurrence of no major shifts in mutation acceptance rate, we find an early differentiation between prokaryote and eukaryote stocks. Afterward the eukaryote stem gave rise first to the protozoan flagellate branch and later to the multicellular green plant branch; after this the fungi and multicellular animal stems diverged from each other. A probable ancestral sequence was estimated for each kingdom of multicellular organisms. The basic eukaryote ancestor was probably a non-photosynthetic, heterotrophic flagellate. The photosynthetic apparatus could have been a later symbiotic acquisition in the plant ancestry. The dicotyledons had differentiated into two stocks before the emergence of a monocotyledon line as did the Ascomycetes before the emergence of the Basidiomycetes. The mollusc and chordate lines may have had a common acoelomate ancestor at the divergence of the arthropod stock. The numbers of mutations on all of the branches of the phylogenetic tree were calculated as well as the numbers of mutations and repeated mutations at each amino acid position.  相似文献   

3.
Summary Three new methods for constructing evolutionary trees from molecular sequence data are presented. These methods are based on a theory for correcting for non-constant evolutionary rates (Klotz et al. 1979; Klotz and Blanken 1981). Extensive computer simulations were run to compare these new methods to the commonly used criteria of Dayhoff (1978) and Fitch and Margoliash (1967). The results of these simulations showed that two of the new methods performed as well as Dayhoff's criterion, significantly better than that of Fitch and Margoliash, and as well as a simple variation of the latter (Prager and Wilson 1978) where any topology containing negative branch mutations is discarded. However, no method yielded the correct topology all of the time, which demonstrated the need to determine confidence estimates in a particular result when evolutionary trees are determined from sequence data.  相似文献   

4.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

5.
Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree.  相似文献   

6.
Atopic dermatitis is a skin disease which affects mainly children, has a very strong genetical component, and manifests itself clinically as flexural excema in connection with torturing itching. The course of disease is notoriously changeable and runs in phases, therefore it is difficult to predict the future course of disease. To improve prediction it would be interesting to identify clusters of children with different disease histories because this would shed light on common genetic and environmental risk factors. We use, relying on previous work of Nagin, a Latent class mixture model to estimate, in a data-dependent and model-based fashion, a clustering of typical binary atopic dermatitis disease histories in children. The data were collected from 1990 to 1997 in the so called MAS-study, a prospective cohort study of 1314 children in five German cities. The original method of Nagin is extended in two different aspects, first we use bootstrap confidence intervals to account for uncertainty in curve fitting, and second, we propose to model covariates for cluster membership by Anderson's Stereotype regression model. We feel that the Latent class mixture model is a valuable tool for assessing the course of atopic dermatitis, yielding a wealth of communicable and graphically displayable results.  相似文献   

7.
One of the main goals in spatial epidemiology is to study the geographical pattern of disease risks. For such purpose, the convolution model composed of correlated and uncorrelated components is often used. However, one of the two components could be predominant in some regions. To investigate the predominance of the correlated or uncorrelated component for multiple scale data, we propose four different spatial mixture multiscale models by mixing spatially varying probability weights of correlated (CH) and uncorrelated heterogeneities (UH). The first model assumes that there is no linkage between the different scales and, hence, we consider independent mixture convolution models at each scale. The second model introduces linkage between finer and coarser scales via a shared uncorrelated component of the mixture convolution model. The third model is similar to the second model but the linkage between the scales is introduced through the correlated component. Finally, the fourth model accommodates for a scale effect by sharing both CH and UH simultaneously. We applied these models to real and simulated data, and found that the fourth model is the best model followed by the second model.  相似文献   

8.
Summary In order to clarify some controverisal phylogenies such as those regarding the triplet of human, rodent, and cow and the evolutionary position of Lagompopha with respect to other mammals, we have analyzed both nuclear and mitochondrial genes using the stationary Markov model developed in our laboratory. We found that the two sets of genes give different results. In particular the mitochondrial tree showed rabbit linked first to rodents and the the rabbit-rodents branch linked to artiodactyls with human as the outgroup. The most favorite nuclear tree showed human linked first to artiocactlys and the human-artiocactyls branch linked to rabbit with rodents as the outgroup. The obvious questions, (1) which tree is the correct one, or (2) both trees can be incorrect, and (3) how can we explain such an evolutionary pattern, are discussed on the basis of our limited knowledge of factors that influence the clocklike behavior of biological macromolecules.  相似文献   

9.
Labeling‐based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein‐level ratios, which is obtained by summarizing peptide‐level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide‐protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide‐level analysis of EBprot provides better receiver‐operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein‐level ratios. We also demonstrate superior classification performance of peptide‐level EBprot analysis in a spike‐in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF‐stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide‐level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling‐based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/ . All MS data have been deposited in the ProteomeXchange with identifier PXD001426 ( http://proteomecentral.proteomexchange.org/dataset/PXD001426/ ).  相似文献   

10.
Two techniques for obtaining information about population structure from nucleotide sequences in DNA are summarized. The first focuses on the selection or neutrality of enzyme polymorphisms, the second on the detection of recombination. Neither method requires phylogeny estimation.  相似文献   

11.
Lin Lin  Wei Shi  Jianbo Ye  Jia Li 《Biometrics》2023,79(2):866-877
One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs, a distribution on the same sample space, is computationally infeasible to solve. Importantly, the barycenter of GMMs may not be a GMM containing a reasonable number of components. We thus propose to use the minimized aggregated Wasserstein (MAW) distance to approximate the Wasserstein metric and develop a new algorithm for computing the barycenter of GMMs under MAW. Recent theoretical advances further justify using the MAW distance as an approximation for the Wasserstein metric between GMMs. We also prove that the MAW barycenter of GMMs has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.  相似文献   

12.
Multilocus genealogical approaches are still uncommon in phylogeography and historical demography, fields which have been dominated by microsatellite markers and mitochondrial DNA, particularly for vertebrates. Using 30 newly developed anonymous nuclear loci, we estimated population divergence times and ancestral population sizes of three closely related species of Australian grass finches (Poephila) distributed across two barriers in northern Australia. We verified that substitution rates were generally constant both among lineages and among loci, and that intralocus recombination was uncommon in our dataset, thereby satisfying two assumptions of our multilocus analysis. The reconstructed gene trees exhibited all three possible tree topologies and displayed considerable variation in coalescent times, yet this information provided the raw data for maximum likelihood and Bayesian estimation of population divergence times and ancestral population sizes. Estimates of these parameters were in close agreement with each other regardless of statistical approach and our Bayesian estimates were robust to prior assumptions. Our results suggest that black-throated finches (Poephila cincta) diverged from long-tailed finches (P. acuticauda and P. hecki) across the Carpentarian Barrier in northeastern Australia around 0.6 million years ago (mya), and that P. acuticauda diverged from P. hecki across the Kimberley Plateau-Arnhem Land Barrier in northwestern Australia approximately 0.3 mya. Bayesian 95% credibility intervals around these estimates strongly support Pleistocene timing for both speciation events, despite the fact that many gene divergences across the Carpentarian region clearly predated the Pleistocene. Estimates of ancestral effective population sizes for the basal ancestor and long-tailed finch ancestor were large (about 521,000 and about 384,000, respectively). Although the errors around the population size parameter estimates are considerable, they are the first for birds taking into account multiple sources of variance.  相似文献   

13.
14.
Multilocus genomic data sets can be used to infer a rich set of information about the evolutionary history of a lineage, including gene trees, species trees, and phylogenetic networks. However, user‐friendly tools to run such integrated analyses are lacking, and workflows often require tedious reformatting and handling time to shepherd data through a series of individual programs. Here, we present a tool written in Python—TREEasy—that performs automated sequence alignment (with MAFFT), gene tree inference (with IQ‐Tree), species inference from concatenated data (with IQ‐Tree and RaxML‐NG), species tree inference from gene trees (with ASTRAL, MP‐EST, and STELLS2), and phylogenetic network inference (with SNaQ and PhyloNet). The tool only requires FASTA files and nine parameters as inputs. The tool can be run as command line or through a Graphical User Interface (GUI). As examples, we reproduced a recent analysis of staghorn coral evolution, and performed a new analysis on the evolution of the “WGD clade” of yeast. The latter revealed novel patterns that were not identified by previous analyses. TREEasy represents a reliable and simple tool to accelerate research in systematic biology ( https://github.com/MaoYafei/TREEasy ).  相似文献   

15.
Background and AimsWithin extending urban areas, trees serve a multitude of functions (e.g. carbon storage, suppression of air pollution, mitigation of the ‘heat island’ effect, oxygen, shade and recreation). Many of these services are positively correlated with tree size and structure. The quantification of above-ground biomass (AGB) is of especial importance to assess its carbon storage potential. However, quantification of AGB is difficult and the allometries applied are often based on forest trees, which are subject to very different growing conditions, competition and form. In this article we highlight the potential of terrestrial laser scanning (TLS) techniques to extract highly detailed information on urban tree structure and AGB.MethodsFifty-five urban trees distributed over seven cities in Switzerland were measured using TLS and traditional forest inventory techniques before they were felled and weighed. Tree structure, volume and AGB from the TLS point clouds were extracted using quantitative structure modelling. TLS-derived AGB estimates were compared with AGB estimates based on forest tree allometries dependent on diameter at breast height only. The correlations of various tree metrics as AGB predictors were assessed.Key ResultsEstimates of AGB derived by TLS showed good performance when compared with destructively harvested references, with an R2 of 0.954 (RMSE = 556 kg) compared with 0.837 (RMSE = 1159 kg) for allometrically derived AGB estimates. A correlation analysis showed that different TLS-derived wood volume estimates as well as trunk diameters and tree crown metrics show high correlation in describing total wood AGB, outperforming tree height.ConclusionsWood volume estimates based on TLS show high potential to estimate tree AGB independent of tree species, size and form. This allows us to retrieve highly accurate non-destructive AGB estimates that could be used to establish new allometric equations without the need for extensive destructive harvesting.  相似文献   

16.
17.
Recently, a lot of concern has been raised about assumptions needed in order to fit statistical models to incomplete multivariate and longitudinal data. In response, research efforts are being devoted to the development of tools that assess the sensitivity of such models to often strong but always, at least in part, unverifiable assumptions. Many efforts have been devoted to longitudinal data, primarily in the selection model context, although some researchers have expressed interest in the pattern-mixture setting as well. A promising tool, proposed by Verbeke et al. (2001, Biometrics 57, 43-50), is based on local influence (Cook, 1986, Journal of the Royal Statistical Society, Series B 48, 133-169). These authors considered the Diggle and Kenward (1994, Applied Statistics 43, 49-93) model, which is based on a selection model, integrating a linear mixed model for continuous outcomes with logistic regression for dropout. In this article, we show that a similar idea can be developed for multivariate and longitudinal binary data, subject to nonmonotone missingness. We focus on the model proposed by Baker, Rosenberger, and DerSimonian (1992, Statistics in Medicine 11, 643-657). The original model is first extended to allow for (possibly continuous) covariates, whereafter a local influence strategy is developed to support the model-building process. The model is able to deal with nonmonotone missingness but has some limitations as well, stemming from the conditional nature of the model parameters. Some analytical insight is provided into the behavior of the local influence graphs.  相似文献   

18.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

19.
Summary .   Many longitudinal studies generate both the time to some event of interest and repeated measures data. This article is motivated by a study on patients with a renal allograft, in which interest lies in the association between longitudinal proteinuria (a dichotomous variable) measurements and the time to renal graft failure. An interesting feature of the sample at hand is that nearly half of the patients were never tested positive for proteinuria (≥1g/day) during follow-up, which introduces a degenerate part in the random-effects density for the longitudinal process. In this article we propose a two-part shared parameter model framework that effectively takes this feature into account, and we investigate sensitivity to the various dependence structures used to describe the association between the longitudinal measurements of proteinuria and the time to renal graft failure.  相似文献   

20.
随着遥感技术的快速发展,基于遥感影像和地面样地的方法成为目前森林碳密度估算的常用手段.然而由于混合像元的存在严重制约了区域森林碳密度反演精度的提高,特别是MODIS这种低空间分辨率影像.本研究以MODIS影像和固定样地为数据源,开展森林碳密度的反演研究.首先利用不带约束、带约束的线性分解和非线性分解3种方法进行混合像元分解,导出不同土地利用/覆盖类型的丰度图;然后采用结合和未结合丰度图的序列高斯协同模拟算法对湖南省森林碳密度进行反演.结果表明: 3种混合像元分解模型中,带约束线性分解估计的地物丰度精度最高(平均均方根误差0.002),明显优于不带约束线性分解和非线性分解模型;通过将混合像元分解模型和序列高斯协同模拟算法结合,森林碳密度反演精度从74.1%提高到81.5%,均方根误差从7.26减小到5.18;2009年湖南省森林碳密度的平均值为30.06 t·hm-2,变化范围介于0.00~67.35 t·hm-2之间.这表明混合像元分解在提高区域和全球尺度森林碳密度反演精度方面显示出巨大的潜力.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号