首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We implement a Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes. A birth-death process with species sampling is used to specify a prior for divergence times, which allows easy assessment of the effects of that prior on posterior time estimates. We propose a new approach for specifying calibration points on the phylogeny, which allows the use of arbitrary and flexible statistical distributions to describe uncertainties in fossil dates. In particular, we use soft bounds, so that the probability that the true divergence time is outside the bounds is small but nonzero. A strict molecular clock is assumed in the current implementation, although this assumption may be relaxed. We apply our new algorithm to two data sets concerning divergences of several primate species, to examine the effects of the substitution model and of the prior for divergence times on Bayesian time estimation. We also conduct computer simulation to examine the differences between soft and hard bounds. We demonstrate that divergence time estimation is intrinsically hampered by uncertainties in fossil calibrations, and the error in Bayesian time estimates will not go to zero with increased amounts of sequence data. Our analyses of both real and simulated data demonstrate potentially large differences between divergence time estimates obtained using soft versus hard bounds and a general superiority of soft bounds. Our main findings are as follows. (1) When the fossils are consistent with each other and with the molecular data, and the posterior time estimates are well within the prior bounds, soft and hard bounds produce similar results. (2) When the fossils are in conflict with each other or with the molecules, soft and hard bounds behave very differently; soft bounds allow sequence data to correct poor calibrations, while poor hard bounds are impossible to overcome by any amount of data. (3) Soft bounds eliminate the need for "safe" but unrealistically high upper bounds, which may bias posterior time estimates. (4) Soft bounds allow more reliable assessment of estimation errors, while hard bounds generate misleadingly high precisions when fossils and molecules are in conflict.  相似文献   

2.
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.  相似文献   

3.
Comparison of methods for estimating the spread of a non-indigenous species   总被引:1,自引:0,他引:1  
Aim  To compare different quantitative approaches for estimating rates of spread in the exotic species gypsy moth, Lymantria dispar L., using county-level presence/absence data and spatially extensive trapping grids.
Location  USA
Methods  We used county-level presence/absence records of the gypsy moth's distribution in the USA, which are available beginning in 1900, and extensive grids of pheromone-baited traps, which are available in selected areas beginning in 1981. We compared a regression approach and a boundary displacement approach for estimating gypsy moth spread based on these sources of data.
Results  We observed relative congruence between methods and data sources in estimating overall rates of gypsy moth spread through time, and among regions.
Main conclusions  The ability to estimate spread in exotic invasive species is a primary concern in management programmes and one for which there is a lack of information on the reliability of methods. Also, in most invading species, there is generally a lack of data to explore methods of estimating spread. Extensive data available on gypsy moth in the USA allowed for such a comparison. We show that, even with spatially crude records of presence/absence, overall rates of spread do not differ substantially from estimates obtained from the more costly deployment of extensive trapping grids. Moreover, these methods can also be applied to the general study of species distributional changes, such as range expansion or retraction, in response to climate change or other environmental effects.  相似文献   

4.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

5.
Rannala B  Yang Z 《Genetics》2003,164(4):1645-1656
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be approximately 20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.  相似文献   

6.
Exposure measurement error can result in a biased estimate of the association between an exposure and outcome. When the exposure–outcome relationship is linear on the appropriate scale (e.g. linear, logistic) and the measurement error is classical, that is the result of random noise, the result is attenuation of the effect. When the relationship is non‐linear, measurement error distorts the true shape of the association. Regression calibration is a commonly used method for correcting for measurement error, in which each individual's unknown true exposure in the outcome regression model is replaced by its expectation conditional on the error‐prone measure and any fully measured covariates. Regression calibration is simple to execute when the exposure is untransformed in the linear predictor of the outcome regression model, but less straightforward when non‐linear transformations of the exposure are used. We describe a method for applying regression calibration in models in which a non‐linear association is modelled by transforming the exposure using a fractional polynomial model. It is shown that taking a Bayesian estimation approach is advantageous. By use of Markov chain Monte Carlo algorithms, one can sample from the distribution of the true exposure for each individual. Transformations of the sampled values can then be performed directly and used to find the expectation of the transformed exposure required for regression calibration. A simulation study shows that the proposed approach performs well. We apply the method to investigate the relationship between usual alcohol intake and subsequent all‐cause mortality using an error model that adjusts for the episodic nature of alcohol consumption.  相似文献   

7.
Hey J  Nielsen R 《Genetics》2004,167(2):747-760
The genetic study of diverging, closely related populations is required for basic questions on demography and speciation, as well as for biodiversity and conservation research. However, it is often unclear whether divergence is due simply to separation or whether populations have also experienced gene flow. These questions can be addressed with a full model of population separation with gene flow, by applying a Markov chain Monte Carlo method for estimating the posterior probability distribution of model parameters. We have generalized this method and made it applicable to data from multiple unlinked loci. These loci can vary in their modes of inheritance, and inheritance scalars can be implemented either as constants or as parameters to be estimated. By treating inheritance scalars as parameters it is also possible to address variation among loci in the impact via linkage of recurrent selective sweeps or background selection. These methods are applied to a large multilocus data set from Drosophila pseudoobscura and D. persimilis. The species are estimated to have diverged approximately 500,000 years ago. Several loci have nonzero estimates of gene flow since the initial separation of the species, with considerable variation in gene flow estimates among loci, in both directions between the species.  相似文献   

8.
Recent advances in technology facilitated development of large sets of genetic markers for many taxa, though most often model or domestic organisms. Cross‐species application of genomic technologies may allow for rapid marker discovery in wild relatives of taxa with well‐developed resources. We investigated returns from cross‐species application of three commercially available SNP chips (the OvineSNP50, BovineSNP50 and EquineSNP50 BeadChips) as a function of divergence time between the domestic source species and wild target species. Across all three chips, we observed a consistent linear decrease in call rate (~1.5% per million years), while retention of polymorphisms showed an exponential decay. These results will allow researchers to predict the expected amplification rate and polymorphism of cross‐species application for their taxa of interest, as well as provide a resource for estimating divergence times.  相似文献   

9.

Key message

Proof of concept of Bayesian integrated QTL analyses across pedigree-related families from breeding programs of an outbreeding species. Results include QTL confidence intervals, individuals’ genotype probabilities and genomic breeding values.

Abstract

Bayesian QTL linkage mapping approaches offer the flexibility to study multiple full sib families with known pedigrees simultaneously. Such a joint analysis increases the probability of detecting these quantitative trait loci (QTL) and provide insight of the magnitude of QTL across different genetic backgrounds. Here, we present an improved Bayesian multi-QTL pedigree-based approach on an outcrossing species using progenies with different (complex) genetic relationships. Different modeling assumptions were studied in the QTL analyses, i.e., the a priori expected number of QTL varied and polygenic effects were considered. The inferences include number of QTL, additive QTL effect sizes and supporting credible intervals, posterior probabilities of QTL genotypes for all individuals in the dataset, and QTL-based as well as genome-wide breeding values. All these features have been implemented in the FlexQTL? software. We analyzed fruit firmness in a large apple dataset that comprised 1,347 individuals forming 27 full sib families and their known ancestral pedigrees, with genotypes for 87 SSR markers on 17 chromosomes. We report strong or positive evidence for 14 QTL for fruit firmness on eight chromosomes, validating our approach as several of these QTL were reported previously, though dispersed over a series of studies based on single mapping populations. Interpretation of linked QTL was possible via individuals’ QTL genotypes. The correlation between the genomic breeding values and phenotypes was on average 90 %, but varied with the number of detected QTL in a family. The detailed posterior knowledge on QTL of potential parents is critical for the efficiency of marker-assisted breeding.  相似文献   

10.
Restriction site-associated DNA sequencing (RAD-seq) and related methods have become relatively common approaches to resolve species-level phylogeny. It is not clear, however, whether RAD-seq data matrices are well suited to relaxed clock inference of divergence times, given the size of the matrices and the abundance of missing data. We investigated the sensitivity of Bayesian relaxed clock estimates of divergence times to alternative analytical decisions on an empirical RAD-seq phylogenetic matrix. We explored the relative contribution of secondary calibration strategies, amount of missing data, and the data partition analyzed to overall variance in divergence times inferred using BEAST MCMC analyses of Carex section Schoenoxiphium (Cyperaceae)—a recent radiation for which we have nearly complete species sampling of RAD-seq data. The crown node for Schoenoxiphium was estimated to be 15.22 (9.56–21.18) Ma using a single calibration point and low missing data, 11.93 (8.07–16.03) Ma using multiple calibration points and low missing data, and 8.34 (5.41–11.22) using multiple calibrations but high missing data. We found that using matrices with more than half of the individuals with missing data inferred younger mean ages for all nodes. Moreover, we have found that our molecular clock estimates are sensitive to the positions of the calibration(s) in our phylogenetic tree (using matrices with low missing data), especially when only a single calibration was applied to estimate divergence times. These results argue for sensitivity analyses and caution in interpreting divergence time estimates from RAD-seq data.  相似文献   

11.
12.

Background

Advances in mass spectrometry have accelerated biomarker discovery in many areas of medicine. The purpose of this study was to compare two mass spectrometry (MS) methods, isobaric tags for relative and absolute quantitation (iTRAQ) and sequential window acquisition of all theoretical fragment ion spectra (SWATH), for analytical efficiency in biomarker discovery when there are multiple methodological constraints such as limited sample size and several time points for each patient to be analyzed.

Methods

A total of 140 tear samples were collected from 28 glaucoma patients at 5 time points in a glaucoma drug switch study. Samples were analyzed with iTRAQ and SWATH methods using NanoLC-MSTOF mass spectrometry.

Results

We discovered that even though iTRAQ is faster than SWATH with respect to analysis time per sample, it loses in sensitivity, reliability and robustness. While SWATH analysis yielded complete data of 456 proteins in all samples, with iTRAQ we were able to quantify 477 proteins in total but on average only 125 proteins were quantified in a sample. 283 proteins were common in the datasets produced by the two methods. Repeatability of the methods was assessed by calculating percent relative standard deviation (% RSD) between replicate MS analyses: SWATH was more repeatable (56% of proteins?<?20% RSD), compared to iTRAQ (43% of proteins?<?20% RSD). Despite the overall benefits of SWATH, both methods showed less than 1 log fold change difference in the expression of 74% common proteins. In addition, comparison to MS/MS peptide results using 8 isotopically labeled peptide standards, SWATH and iTRAQ showed similar results in terms of accuracy. Moreover, both methods detected similar trends in a longitudinal analysis of protein expression of two known tear biomarkers.

Conclusions

Overall, we conclude that SWATH should be preferred for biomarker discovery studies when analyzing limited volumes of clinical samples collected at multiple time points.

Trial Registeration

The study was approved by the Ethics Committee at Tampere University Hospital and was registered in EU clinical trials register (EudraCT Number: 2010-021039-14).
  相似文献   

13.
三种回归分析方法在Hyperion影像LAI反演中的比较   总被引:2,自引:0,他引:2  
孙华  鞠洪波  张怀清  林辉  凌成星 《生态学报》2012,32(24):7781-7790
借助GPS进行地面精确定位,利用LAI-2000冠层分析仅在攸县黄丰桥林场开展130个样地(60m×60m)的叶面积指数(Leaf Area Index,LAI)测量.采用FLAASH模块对Hyperion数据进行大气校正并与地面同步冠层观测数据进行拟合,通过研究地面实测LAI与Hyperion影像波段及其衍生的系列植被指数(NDVI、RVI等)的相关性,筛选出估算叶面积指数的植被指数因子.应用曲线估计、逐步回归及偏最小二乘三种回归分析技术分别建立叶面积指数的最优估算模型.结果表明:参与建模的因子中,比值植被指数(RVI)与LAI的相关性最大,敏感性最高,其次是SARVI0.1,NDVI705,NDVI,SARVI0.1,SARVI0.25;曲线估计、逐步回归分析和偏最小二乘回归三种分析方法所建的6个回归模型中,偏最小二乘回归的拟合效果最好,预测值与实测值的决定系数R2为0.84、曲线估计的拟合效果最低,预测值与实测值的决定系数R2为0.64;建模精度分析表明,选用5-6个自变量因子进行LAI建模是可靠的,以6个植被因子建立的偏最小二乘回归模型预测精度最高.  相似文献   

14.
B Rosner 《Biometrics》1992,48(3):721-731
Clustered binary data occur frequently in biostatistical work. Several approaches have been proposed for the analysis of clustered binary data. In Rosner (1984, Biometrics 40, 1025-1035), a polychotomous logistic regression model was proposed that is a generalization of the beta-binomial distribution and allows for unit- and subunit-specific covariates, while controlling for clustering effects. One assumption of this model is that all pairs of subunits within a cluster are equally correlated. This is appropriate for ophthalmologic work where clusters are generally of size 2, but may be inappropriate for larger cluster sizes. A beta-binomial mixture model is introduced to allow for multiple subclasses within a cluster and to estimate odds ratios relating outcomes for pairs of subunits within a subclass as well as in different subclasses. To include covariates, an extension of the polychotomous logistic regression model is proposed, which allows one to estimate effects of unit-, class-, and subunit-specific covariates, while controlling for clustering using the beta-binomial mixture model. This model is applied to the analysis of respiratory symptom data in children collected over a 14-year period in East Boston, Massachusetts, in relation to maternal and child smoking, where the unit is the child and symptom history is divided into early-adolescent and late-adolescent symptom experience.  相似文献   

15.

Background and aims

Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree–species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera.

Methods

DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree–species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted.

Key Results

Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (MacrozamiaLepidozamiaEncephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia.

Conclusions

A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae.  相似文献   

16.
We present a revised molecular phylogeny of the Drosophila repleta group including 62 repleta group taxa and nine outgroup species based on four mitochondrial and six nuclear DNA sequence fragments. With ca. 100 species endemic to the New World, the repleta species group represents one of the major species radiations in the genus Drosophila. Most repleta group species are associated with cacti in arid or semiarid regions. Contrary to previous results, maximum likelihood and Bayesian phylogenies of the 10-gene dataset strongly support the monophyly of the repleta group. Several previously described subdivisions in the group were also recovered, despite poorly resolved relationships between these clades. Divergence time estimates suggested that the repleta group split from its sister group about 21millionyears ago (Mya), although diversification of the crown group began ca. 16Mya. Character mapping of patterns of host plant use showed that flat leaf Opuntia use is common throughout the phylogeny and that shifts in host use from Opuntia to the more chemically complex columnar cacti occurred several times independently during the history of this group. Although some species retained the use of Opuntia after acquiring the use of columnar cacti, there were multiple, phylogenetically independent instances of columnar cactus specialization with loss of Opuntia as a host. Concordant with our proposed timing of host use shifts, these dates are consistent with the suggested times when the Opuntioideae originated in South America. We discuss the generally accepted South American origin of the repleta group.  相似文献   

17.
Statistical methods for estimating divergence times by using multiprotein gamma distances are discussed. When a large number of proteins are used, even a small degree of deviation from the molecular clock hypothesis can be detected. In this case, one may use the stem-lineage method for estimating divergence times. However, the estimates obtained by this method are often similar to those obtained by the linearized tree method. Application of these methods to a dataset of 104 proteins from several vertebrate species indicated that the divergence times between humans and mice and between mice and rats are about 96 and 33 million years (MY) ago, respectively. These estimates were obtained by assuming that birds and mammals diverged 310 MY ago. Similarly application of the methods to the protein sequence data from primate species indicated that the human lineage separated from the chimpanzee, gorilla, Old World monkeys, and New World monkeys about 6.0, 7.0, 23.0, and 33.0 MY ago, respectively. In this case the use of two calibration points, that is, the divergence time (13 MY ago) between humans and orangutans and between primates and artiodactyls (90 MY ago) gave essentially the same estimates.  相似文献   

18.
Photocopying was found to be a rapid method of making a permanent record of a root sample. The method used produced a copy with white roots against a black background. Manual estimates of root length were made from photocopies using a light box. The number of intersections visible when laid over a copy of a white on black regular square grid was counted. Automated estimates of root length were made by scanning a photocopy with a bar code reader in place of a pen in a computer-driven graph plotter. Roots >0.2 mm diameter were resolved with precision and speed.  相似文献   

19.
Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号