首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly observed in data. The performance of phylogenetic reconstruction methods where the underlying data are generated by a mixture model has stimulated considerable recent debate. Much of the controversy stems from simulations of mixture model data on a given tree topology for which reconstruction algorithms output a tree of a different topology; these findings were held up to show the shortcomings of particular tree reconstruction methods. In so doing, the underlying assumption was that mixture model data on one topology can be distinguished from data evolved on an unmixed tree of another topology given enough data and the "correct" method. Here we show that this assumption can be false. For biologists, our results imply that, for example, the combined data from two genes whose phylogenetic trees differ only in terms of branch lengths can perfectly fit a tree of a different topology.  相似文献   

2.
Aim  To develop a physiologically based model of the plant niche for use in species distribution modelling. Location  Europe. Methods  We link the Thornley transport resistance (TTR) model with functions which describe how the TTR’s model parameters are influenced by abiotic environmental factors. The TTR model considers how carbon and nutrient uptake, and the allocation of these assimilates, influence growth. We use indirect statistical methods to estimate the model parameters from a high resolution data set on tree distribution for 22 European tree species. Results  We infer, from distribution data and abiotic forcing data, the physiological niche dimensions of 22 European tree species. We found that the model fits were reasonable (AUC: 0.79–0.964). The projected distributions were characterized by a false positive rate of 0.19 and a false negative rate 0.12. The fitted models are used to generate projections of the environmental factors that limit the range boundaries of the study species. Main conclusions  We show that physiological models can be used to derive physiological niche dimensions from species distribution data. Future work should focus on including prior information on physiological rates into the parameter estimation process. Application of the TTR model to species distribution modelling suggests new avenues for establishing explicit links between distribution and physiology, and for generating hypotheses about how ecophysiological processes influence the distribution of plants.  相似文献   

3.
We compared the ability of three machine learning algorithms (linear discriminant analysis, decision tree, and support vector machines) to automate the classification of calls of nine frogs and three bird species. In addition, we tested two ways of characterizing each call to train/test the system. Calls were characterized with four standard call variables (minimum and maximum frequencies, call duration and maximum power) or eleven variables that included three standard call variables (minimum and maximum frequencies, call duration) and a coarse representation of call structure (frequency of maximum power in eight segments of the call). A total of 10,061 isolated calls were used to train/test the system. The average true positive rates for the three methods were: 94.95% for support vector machine (0.94% average false positive rate), 89.20% for decision tree (1.25% average false positive rate) and 71.45% for linear discriminant analysis (1.98% average false positive rate). There was no statistical difference in classification accuracy based on 4 or 11 call variables, but this efficient data reduction technique in conjunction with the high classification accuracy of the SVM is a promising combination for automated species identification by sound. By combining automated digital recording systems with our automated classification technique, we can greatly increase the temporal and spatial coverage of biodiversity data collection.  相似文献   

4.
Vallat BK  Pillardy J  Elber R 《Proteins》2008,72(3):910-928
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score.  相似文献   

5.
Analysis of sequence data using time‐reversible substitution models and maximum likelihood (ML) algorithms is currently the most popular method to infer phylogenies, despite the fact that results often contradict each other. Searching for sources of error we focus on a hitherto neglected feature of these methods: character polarity is usually thought to be irrelevant in ML analyses. Mechanisms that lead to wrong tree topologies were analysed at the level of split‐supporting site patterns. In simulations, plesiomorphic site patterns can be identified by comparison with known root sequences. These patterns cause some surprising effects: Using data sets generated with simulations of sequence evolution along a variety of topologies and inferring trees using the same (correct) model, we show for cases of branch‐length heterogeneity that (i) as already known, ML analyses can fail to recover the correct tree even when the correct substitution model is used, but also that (ii) plesiomorphic character states cause substantial mistakes and therefore character polarity is relevant, and (iii) accumulating chance similarities on long branches are far less misleading than plesiomorphic states accumulating on shorter branches. The artefacts occur when branch lengths are heterogeneous. The systematic errors disappear for the most part when the sites with symplesiomorphies supporting false clades are deleted from the data set. We conclude that many of the phylogenies published during the past decades may be false due to the neglected effects of symplesiomorphies.  相似文献   

6.
Coalescent theory is commonly used to perform population genetic inference at the nucleotide level. Here, we examine the procedure that fixes the number of segregating sites (henceforth the FS procedure). In this approach a fixed number of segregating sites (S) are placed on a coalescent tree (independently of the total and internode lengths of the tree). Thus, although widely used, the FS procedure does not strictly follow the assumptions of coalescent theory and must be considered an approximation of (i) the standard procedure that uses a fixed population mutation parameter theta, and (ii) procedures that condition on the number of segregating sites. We study the differences in the false positive rate for nine statistics by comparing the FS procedure with the procedures (i) and (ii), using several evolutionary models with single-locus and multilocus data. Our results indicate that for single-locus data the FS procedure is accurate for the equilibrium neutral model, but problems arise under the alternative models studied; furthermore, for multilocus data, the FS procedure becomes inaccurate even for the standard neutral model. Therefore, we recommend a procedure that fixes the theta value (or alternatively, procedures that condition on S and take into account the uncertainty of theta) for analysing evolutionary models with multilocus data. With single-locus data, the FS procedure should not be employed for models other than the standard neutral model.  相似文献   

7.
Fire disturbance patterns influence forest communities at a range of spatial scales. Forest community structure may also influence fire disturbance patterns, because tree species vary in their fuel value and in their tolerance to fire damage. However, the influence of community structure on fire disturbance likely depends on latent ecological differences between fires and on the spatial scale at which patterns are observed. Using data on fire intensity, community structure, and post-fire tree survival in four systematically sampled boreal forest fires, we tested the hypotheses that: (1) patterns in post-fire tree survival reflect interactions between fire intensity and community structure; (2) these relationships change with the spatial scale of observation. To test the first hypothesis, we used information theoretic methods to compare eight generalized linear mixed effects models describing the influence of community structure and fire intensity on tree survival in a 500 m2 sample plot, accounting for latent fire-to-fire differences in response. To test the scaling hypothesis, we reaveraged the data at nine successively larger spatial resolutions up to approximately 2 km2, at each resolution tracking the parameter values of the best model. When fit to the plot-level data, the dominant feature of the best model was a strong intensity–survival correlation which varied from fire to fire, and depended on plot-level community structure. In some fires, community structure and survival became more tightly coupled at larger scales, whereas fire intensity became less important. These results support the view that fire disturbance patterns are influenced by cross-scale interactions between community structure and fire intensity.  相似文献   

8.
Concerns have been raised that posterior probabilities on phylogenetic trees can be unreliable when the true tree is unresolved or has very short internal branches, because existing methods for Bayesian phylogenetic analysis do not explicitly evaluate unresolved trees. Two recent papers have proposed that evaluating only resolved trees results in a "star tree paradox": when the true tree is unresolved or close to it, posterior probabilities were predicted to become increasingly unpredictable as sequence length grows, resulting in inflated confidence in one resolved tree or another and an increasing risk of false-positive inferences. Here we show that this is not the case; existing Bayesian methods do not lead to an inflation of statistical confidence, provided the evolutionary model is correct and uninformative priors are assumed. Posterior probabilities do not become increasingly unpredictable with increasing sequence length, and they exhibit conservative type I error rates, leading to a low rate of false-positive inferences. With infinite data, posterior probabilities give equal support for all resolved trees, and the rate of false inferences falls to zero. We conclude that there is no star tree paradox caused by not sampling unresolved trees.  相似文献   

9.
Structure-function relationships in the pulmonary arterial tree   总被引:1,自引:0,他引:1  
Knowledge of the relationship between structure and function ofthe normal pulmonary arterial tree is necessary for understanding normal pulmonary hemodynamics and the functional consequences of thevascular remodeling that accompanies pulmonary vascular diseases. In aneffort to provide a means for relating the measurable vascular geometryand vessel mechanics data to the mean pressure-flow relationship andlongitudinal pressure profile, we present a mathematical model of thepulmonary arterial tree. The model is based on the observation that thenormal pulmonary arterial tree is a bifurcating tree in which theparent-to-daughter diameter ratios at a bifurcation and vesseldistensibility are independent of vessel diameter, and although theactual arterial tree is quite heterogeneous, the diameter of eachroute, through which the blood flows, tapers from the arterial inlet toessentially the same terminal arteriolar diameter. In the model theaverage route is represented as a tapered tube through which the bloodflow decreases with distance from the inlet because of the diversion offlow at the many bifurcations along the route. The taper and flowdiversion are expressed in terms of morphometric parameters obtainedusing various methods for summarizing morphometric data. To help putthe model parameter values in perspective, we applied one such methodto morphometric data obtained from perfused dog lungs. Modelsimulations demonstrate the sensitivity of model pressure-flowrelationships to variations in the morphometric parameters. Comparisonsof simulations with experimental data also raise questions as to the"hemodynamically" appropriate ways to summarize morphometric data.  相似文献   

10.
A comparison was made of four statistically based schemes for classifying epithelial cells from 243 fine needle aspirates of breast masses as benign or malignant. Two schemes were computer-generated decision trees and two were user generated. Eleven cytologic characteristics described in the literature as being useful in distinguishing benign from malignant breast aspirates were assessed on a scale of 1 to 10, with 1 being closest to that described as benign and 10 to that described as malignant. The original computer-generated dichotomous decision tree gave 6 false negatives and 12 false positives on the data set; another tree generated from the current data improved performance slightly, with 5 false negatives and 10 false positives. Maximum diagnostic overlap occurred at the cut-point of the original dichotomous tree. The insertion of a third node evaluating additional parameters resulted in one false negative and seven false positives. This performance was matched by summing the scores of the eight characteristics that individually were most effective in separating benign from malignant. We conclude that, while statistically designed, computer-generated dichotomous decision trees identify a starting sequence for applying cytologic characteristics to distinguish between benign and malignant breast aspirates, modifications based on human expert knowledge may result in schemes that improve diagnostic performance.  相似文献   

11.
基于GreenLab原理构建油松成年树的结构-功能模型   总被引:1,自引:0,他引:1       下载免费PDF全文
 林木的结构-功能模型(functional-structural tree modeling, FSTMs)是基于器官级组件构建的将植物结构和功能结合起来的一类模型, 在应用于成年树时需要解决拓扑结构复杂性和年轮分配模式普适性的问题。该文以18年生和41年生的油松 (Pinus tabulaeformis)成年树为研究对象, 将GreenLab模型应用到成年树的模拟中。采用破坏性取样, 实测了2株油松成年树的形态结构, 利用子结构模型解决成年树拓扑结构复杂性的问题, 引入年轮影响系数λ, 将全局分配模式和Pressler模式结合起 来, 解决年轮分配模式在不同年龄和环境条件下不同的问题。模型的直接参数通过实测数据获得, 隐含参数利用非线性最小二乘法拟合反求获得。通过实测数据与模拟数据的对比、模拟数据与经验模型模拟数据的对比, 对模型的模拟效果进行了评估, 发现节间总重、针叶总重、树高、树干节间重观测值和模型模拟值建立的回归方程的决定系数为0.84–0.98, 结构-功能模型与经验模型对总生物量模拟的决定系数为0.95, 表明该模型能较真实地反映油松的结构和生长过程。  相似文献   

12.
Tree growth varies closely with high–frequency climate variability. Since the 1930s detrending climate data prior to comparing them with tree growth data has been shown to better capture tree growth sensitivity to climate. However, in a context of increasingly pronounced trends in climate, this practice remains surprisingly rare in dendroecology. In a review of Dendrochronologia over the 2018–2021 period, we found that less than 20 % of dendroecological studies detrended climate data prior to climate-growth analyses. With an illustrative study, we want to remind the dendroecology community that such a procedure is still, if not more than ever, rational and relevant. We investigated the effects of detrending climate data on climate–growth relationships across North America over the 1951–2000 period. We used a network of 2536 tree individual ring-width series from the Canadian and Western US forest inventories. We compared correlations between tree growth and seasonal climate data (Tmin, Tmax, Prec) both raw and detrended. Detrending approaches included a linear regression, 30-yr and 100-yr cubic smoothing splines. Our results indicate that on average the detrending of climate data increased climate–growth correlations. In addition, we observed that strong trends in climate data translated to higher variability in inferred correlations based on raw vs. detrended climate data. We provide further evidence that our results hold true for the entire spectrum of dendroecological studies using either mean site chronologies and correlations coefficients, or individual tree time series within a mixed-effects model framework where regression coefficients are used more commonly. We show that even without a change in correlation, regression coefficients can change a lot and we tend to underestimate the true climate impact on growth in case of climate variables containing trends. This study demonstrates that treating climate and tree-ring time series “like-for-like” is a necessary procedure to reduce false negatives and positives in dendroecological studies. Concluding, we recommend using the same detrending for climate and tree growth data when tree-ring time series are detrended with splines or similar frequency-based filters.  相似文献   

13.
Peak detection is one of the most important steps in mass spectrometry (MS) analysis. However, the detection result is greatly affected by severe spectrum variations. Unfortunately, most current peak detection methods are neither flexible enough to revise false detection results nor robust enough to resist spectrum variations. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition via both peak intensity and common peak information. The common peak information is derived and loopily refined from the density clustering of the latest peak detection result. Finally, we present an improved ant colony optimization biomarker selection method to build a whole MS analysis system. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak-tree-based system for MS disease analysis are also proved on real SELDI data.  相似文献   

14.
Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. We employ the gradient descent boosting procedure to build an additive tree model and propose a new algorithm to utilize the network structure in fitting small tree weak learners. We illustrate by simulation studies and a real data example that, by making use of the network information, NetBoosting outperforms a few existing methods in terms of accuracy of prediction and variable selection.  相似文献   

15.
Adaptive evolution frequently occurs in episodic bursts, localized to a few sites in a gene, and to a small number of lineages in a phylogenetic tree. A popular class of "branch-site" evolutionary models provides a statistical framework to search for evidence of such episodic selection. For computational tractability, current branch-site models unrealistically assume that all branches in the tree can be partitioned a priori into two rigid classes--"foreground" branches that are allowed to undergo diversifying selective bursts and "background" branches that are negatively selected or neutral. We demonstrate that this assumption leads to unacceptably high rates of false positives or false negatives when the evolutionary process along background branches strongly deviates from modeling assumptions. To address this problem, we extend Felsenstein's pruning algorithm to allow efficient likelihood computations for models in which variation over branches (and not just sites) is described in the random effects likelihood framework. This enables us to model the process at every branch-site combination as a mixture of three Markov substitution models--our model treats the selective class of every branch at a particular site as an unobserved state that is chosen independently of that at any other branch. When benchmarked on a previously published set of simulated sequences, our method consistently matched or outperformed existing branch-site tests in terms of power and error rates. Using three empirical data sets, previously analyzed for episodic selection, we discuss how modeling assumptions can influence inference in practical situations.  相似文献   

16.
We propose a model-based approach that combines Bayesian variable selection tools, a novel spatial kernel convolution structure, and autoregressive processes for detecting a subject's brain activation at the voxel level in complex-valued functional magnetic resonance imaging (CV-fMRI) data. A computationally efficient Markov chain Monte Carlo algorithm for posterior inference is developed by taking advantage of the dimension reduction of the kernel-based structure. The proposed spatiotemporal model leads to more accurate posterior probability activation maps and less false positives than alternative spatial approaches based on Gaussian process models, and other complex-valued models that do not incorporate spatial and/or temporal structure. This is illustrated in the analysis of simulated data and human task-related CV-fMRI data. In addition, we show that complex-valued approaches dominate magnitude-only approaches and that the kernel structure in our proposed model considerably improves sensitivity rates when detecting activation at the voxel level.  相似文献   

17.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

18.
Recombination can negatively impact methods designed to detect divergent gene function that rely on explicit knowledge of a gene tree. However, we know little about how recombination detection methods perform under evolutionary scenarios encountered in studies of functional molecular divergence. We use simulation to evaluate false positive rates for six recombination detection methods (GENECONV, MaxChi, Chimera, RDP, GARD-SBP, GARD-MBP) under evolutionary scenarios that might increase false positives. Broadly, these scenarios address: (i) asymmetric tree topology and sequence divergence, (ii) non-stationary codon bias and selection pressure, and (iii) positive selection. We also evaluate power to detect recombination under truly recombinant history. As with previous studies, we find that power increases with sequence divergence. However, we also find that accuracy to correctly infer the number of breakpoints is extremely low. When recombination is absent, increased sequence divergence leads to increased false positives. Furthermore, one method (GARD-SBP) is sensitive to tree shape, with higher false positive rates under an asymmetric tree topology. Somewhat surprisingly, all methods are robust to the simulated heterogeneity in codon bias, shifts in selection pressure and presence of positive selection. Based on these findings, we recommend that studies of functional divergence in systems where recombination is plausible can, and should, include a pre-test for recombination. Application of all methods to the core genome of Prochlorococcus reveals a substantial lack of concordance among results. Based on analysis of both real and simulated datasets we present some guidelines for the investigation of recombination in genes that may have experienced functional divergence.  相似文献   

19.
MOTIVATION: The study of carbohydrate sugar chains, or glycans, has been one of slow progress mainly due to the difficulty in establishing standard methods for analyzing their structures and biosynthesis. Glycans are generally tree structures that are more complex than linear DNA or protein sequences, and evidence shows that patterns in glycans may be present that spread across siblings and into further regions that are not limited by the edges in the actual tree structure itself. Current models were not able to capture such patterns. RESULTS: We have applied a new probabilistic model, called probabilistic sibling-dependent tree Markov model (PSTMM), which is able to inherently capture such complex patterns of glycans. Not only is the ability to capture such patterns important in itself, but this also implies that PSTMM is capable of performing multiple tree structure alignments efficiently. We prove through experimentation on actual glycan data that this new model is extremely useful for gaining insight into the hidden, complex patterns of glycans, which are so crucial for the development and functioning of higher level organisms. Furthermore, we also show that this model can be additionally utilized as an innovative approach to multiple tree alignment, which has not been applied to glycan chains before. This extension on the usage of PSTMM may be a major step forward for not only the structural analysis of glycans, but it may consequently prove useful for discovering clues into their function.  相似文献   

20.
基于GreenLab的油松结构-功能模型   总被引:5,自引:1,他引:4       下载免费PDF全文
 植物结构-功能模型(Functional-structural models, FSMs)将结构模型与过程模型结合起来, 用以描述环境机制驱动的植物生长, 输出植物的三维结构。GreenLab是一个近年来不断发展着的基于源-汇关系的通用植物结构-功能模型, 它多应用于农作物, 在树木方面的应用还很少。该文以幼龄油松(Pinus tabulaeformis)为研究对象, 首次将GreenLab模型应用到虚拟树木生长的研究中。采用破坏性取样, 实测了9株油松幼树的形态结构、拓扑结构和器官生物量信息, 根据拓扑编码体系组织数据。模型的直接参数是通过实测数据获得的, 隐含参数是利用非线性最小二乘法拟合反求获得的。对模型的假设进行了验证, 并对模型的模拟效果进行了评估, 结果表明: 节间总鲜质量、树木叶总鲜质量、节间鲜质量、节间长度观测值和模型模拟值建立的回归方程的决定系数在0.78~0.91之间, 因此该模型较真实地反映了油松的结构和生长过程。提出的树木结构和生物量测量及编码方法, 可作为针叶树建立结构-功能模型的参照。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号