首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 48 毫秒
1.
We present a new multilocus method for the fine-scale mapping of genes contributing to human diseases. The method is designed for use with multiple biallelic markers-in particular, single-nucleotide polymorphisms for which high-density genetic maps will soon be available. We model disease-marker association in a candidate region via a hidden Markov process and allow for correlation between linked marker loci. Using Markov-chain-Monte Carlo simulation methods, we obtain posterior distributions of model parameter estimates including disease-gene location and the age of the disease-predisposing mutation. In addition, we allow for heterogeneity in recombination rates, across the candidate region, to account for recombination hot and cold spots. We also obtain, for the ancestral marker haplotype, a posterior distribution that is unique to our method and that, unlike maximum-likelihood estimation, can properly account for uncertainty. We apply the method to data for cystic fibrosis and Huntington disease, for which mutations in disease genes have already been identified. The new method performs well compared with existing multi-locus mapping methods.  相似文献   

2.
Aim Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data. Location Europe, North America and South America. Methods The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with pre‐defined distributions and amounts of niche overlap to evaluate several ordination and species distribution modelling techniques for quantifying niche overlap. We illustrate the approach with data on two well‐studied invasive species. Results We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographical space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results. Main conclusions The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate for studying niche differences between species, subspecies or intra‐specific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intra‐specific lineage has changed over time.  相似文献   

3.
Classification of species into different functional groups based on biological criteria has been a difficult problem in ecology. The difficulty mainly arises because natural classification patterns are not necessarily mutually exclusive. The more group characteristics overlap, the more difficult it is to identify the membership of a species in the overlapping portions of any two groups. In this paper, we present an application of discriminant analysis by creating classification models from life history and morphological data for two specialist and two generalist life-styles type of predaceous phytoseiid mites. Two stages can be distinguished in our method: life-style group membership assignment and trait variable evaluation. We use a Bayesian framework to create a classifier system to locate or assign species within a mixture of trait distributions. The method assumes that a mixture of trait distributions can represent the multiple dimensions of biological data. The mixture is most evident near the boundaries between groups. Because of the complexity of analytical solution, an iterative method is used to estimate the unknown means, variances, and mixing proportion between groups. We also developed a criterion based on information theory to evaluate model performance with different combinations of input variables and different hypotheses. We present a working example of our proposed methods. We apply these methods to the problem of selecting key species for inoculative release and for classical introductions of biological pest control agents.  相似文献   

4.
Estimating the degree of sexual dimorphism is difficult in fossil species because most specimens lack indicators of sex. We present a procedure that estimates sexual dimorphism in samples of unknown sex using method-of-moments. We assume that the distribution of a metric trait is composed of two underlying normal distributions, one for males and one for females. We use three moments around the mean of the combined-sex distribution to estimate the means and the common standard deviation of the two underlying distributions. This procedure has advantages over previous methods: it is relatively simple to use, specimens need not be assigned to sex a priori, no reference to living species analogs is required, and the method provides conservative estimates of dimorphism under a variety of conditions. The method performs best when the male and female distributions overlap minimally but also works well when overlap is substantial. Simulations indicate that this relatively simple method is more accurate and reliable than previous methods for estimating dimorphism. © 1996 Wiley-Liss, Inc.  相似文献   

5.
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.  相似文献   

6.
Summary In 2002, Ker–Chau Li introduced the liquid association measure to characterize three‐way interactions between genes, and developed a computationally efficient estimator that can be used to screen gene expression microarray data for such interactions. That study, and others published since then, have established the biological validity of the method, and clearly demonstrated it to be a useful tool for the analysis of genomic data sets. To build on this work, we have sought a parametric family of multivariate distributions with the flexibility to model the full range of trivariate dependencies encompassed by liquid association. Such a model could situate liquid association within a formal inferential theory. In this article, we describe such a family of distributions, a trivariate, conditional normal model having Gaussian univariate marginal distributions, and in fact including the trivariate Gaussian family as a special case. Perhaps the most interesting feature of the distribution is that the parameterization naturally parses the three‐way dependence structure into a number of distinct, interpretable components. One of these components is very closely aligned to liquid association, and is developed as a measure we call modified liquid association. We develop two methods for estimating this quantity, and propose statistical tests for the existence of this type of dependence. We evaluate these inferential methods in a set of simulations and illustrate their use in the analysis of publicly available experimental data.  相似文献   

7.
Xie W  Lewis PO  Fan Y  Kuo L  Chen MH 《Systematic biology》2011,60(2):150-160
The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo analysis but often greatly overestimates the marginal likelihood. The thermodynamic integration (TI) method is much more accurate than the HM method but requires more computation. In this paper, we introduce a new method, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions. We compare the performance of the SS approach to the TI and HM methods in simulation and using real data. We conclude that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed.  相似文献   

8.
Studies investigating the consequences of future climate changes on species distributions usually start with the assumption that species respond to climate changes in an individualistic fashion. This assumption has led researchers to use bioclimate envelope models that use present climate-range relationships to characterize species' limits of tolerance to climate, and then apply climate-change scenarios to enable projections of altered species distributions. However, there are techniques that combine climate variables together with information on the composition of assemblages to enable projections that are expected to mimic community dynamics. Here, we compare, for the first time, the performance of GLM (generalized linear model) and CQO (canonical quadratic ordination; a type of community-based GLM) for projecting distributions of species under climate change scenarios. We found that projections from these two methods varied both in terms of accuracy (GLM providing generally more accurate projections than CQO) and in the broad diversity patterns yielded (higher species richness values projected with CQO). Model outputs were also affected by species-specific traits, such as species range size and species geographical positions, supporting the view that methods are sensitive to different degrees of equilibrium of species distributions with climate. This study reveals differences in projections between individual- and community-based approaches that require further scrutiny, but it does not find support for unsupervised use community-based models for investigating climate change impacts on species distributions. Reasons for this lack of support are discussed.  相似文献   

9.
The total deviation index of Lin and Lin et al. is an intuitive approach for the assessment of agreement between two methods of measurement. It assumes that the differences of the paired measurements are a random sample from a normal distribution and works essentially by constructing a probability content tolerance interval for this distribution. We generalize this approach to the case when differences may not have identical distributions -- a common scenario in applications. In particular, we use the regression approach to model the mean and the variance of differences as functions of observed values of the average of the paired measurements, and describe two methods based on asymptotic theory of maximum likelihood estimators for constructing a simultaneous probability content tolerance band. The first method uses bootstrap to approximate the critical point and the second method is an analytical approximation. Simulation shows that the first method works well for sample sizes as small as 30 and the second method is preferable for large sample sizes. We also extend the methodology for the case when the mean function is modeled using penalized splines via a mixed model representation. Two real data applications are presented.  相似文献   

10.
Using multiple detection methods can increase the number, kind, and distribution of individuals sampled, which may increase accuracy and precision and reduce cost of population abundance estimates. However, when variables influencing abundance are of interest, if individuals detected via different methods are influenced by the landscape differently, separate analysis of multiple detection methods may be more appropriate. We evaluated the effects of combining two detection methods on the identification of variables important to local abundance using detections of grizzly bears with hair traps (systematic) and bear rubs (opportunistic). We used hierarchical abundance models (N-mixture models) with separate model components for each detection method. If both methods sample the same population, the use of either data set alone should (1) lead to the selection of the same variables as important and (2) provide similar estimates of relative local abundance. We hypothesized that the inclusion of 2 detection methods versus either method alone should (3) yield more support for variables identified in single method analyses (i.e. fewer variables and models with greater weight), and (4) improve precision of covariate estimates for variables selected in both separate and combined analyses because sample size is larger. As expected, joint analysis of both methods increased precision as well as certainty in variable and model selection. However, the single-method analyses identified different variables and the resulting predicted abundances had different spatial distributions. We recommend comparing single-method and jointly modeled results to identify the presence of individual heterogeneity between detection methods in N-mixture models, along with consideration of detection probabilities, correlations among variables, and tolerance to risk of failing to identify variables important to a subset of the population. The benefits of increased precision should be weighed against those risks. The analysis framework presented here will be useful for other species exhibiting heterogeneity by detection method.  相似文献   

11.
We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.  相似文献   

12.
Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.  相似文献   

13.
The usual analysis of quantal response data occurring in diverse fields such as economics, medicine, psychology and toxicology use probit and logit models or their extensions with generalized least squares or the principle of likelihood as the method of statistical inference. The symmetric alternative models lead to practically comparable results and the choice of model or method is determined by considerations of familiarity and computational convenience. Recent attempts at improvement involve larger parametric families of tolerance distributions and employ the method of maximum likelihood in analysis. In this paper we consider models with the tolerance distributions based upon the Tukey-lambda distributions which are described in terms of their quantile functions. The likelihood methods for fitting the models and testing their adequacies are developed and illustrated using classical data due to BLISS (1935) and ASHFORD and SMITH (1964).  相似文献   

14.
Xu S  Yonash N  Vallejo RL  Cheng HH 《Genetica》1998,104(2):171-178
A typical problem in mapping quantitative trait loci (QTLs) comes from missing QTL genotype. A routine method for parameter estimation involving missing data is the mixture model maximum likelihood method. We developed an alternative QTL mapping method that describes a mixture of several distributions by a single model with a heterogeneous residual variance. The two methods produce similar results, but the heterogeneous residual variance method is computationally much faster than the mixture model approach. In addition, the new method can automatically generate sampling variances of the estimated parameters. We derive the new method in the context of QTL mapping for binary traits in a F2 population. Using the heterogeneous residual variance model, we identified a QTL on chromosome IV that controls Marek's disease susceptibility in chickens. The QTL alone explains 7.2% of the total disease variation. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

15.
16.
In Part I and Part II of these two companion papers (henceforth called Part I and Part II), we develop and evaluate a variational Bayesian expectation maximization (VBEM) method for model inversion of our multi-area extended neural mass model (MEN). In this paper, we develop the VBEM method to estimate posterior distributions of parameters of MEN. We choose suitable prior distributions for the model parameters in order to use properties of a conjugate-exponential model in implementing VBEM. Consequently, VBEM leads to analytically tractable forms. The proposed VBEM algorithm starts with initialization and consists of repeated iterations of a variational Bayesian expectation step (VB E-step) and a variational Bayesian maximization step (VB M-step). Posterior distributions of the model parameters are updated in the VB M-step. Distribution of the hidden state is updated in the VB E-step. We develop a variational extended Kalman smoother (VEKS) to infer the distribution of the hidden state in the VB E-step and derive the forward and backward passes of VEKS, analogous to the Kalman smoother. In Part I, we evaluate and validate the VBEM method using simulation studies.  相似文献   

17.
The clinical serial interval of an infectious disease is the time between date of symptom onset in an index case and the date of symptom onset in one of its secondary cases. It is a quantity which is commonly collected during a pandemic and is of fundamental importance to public health policy and mathematical modelling. In this paper we present a novel method for calculating the serial interval distribution for a Markovian model of household transmission dynamics. This allows the use of Bayesian MCMC methods, with explicit evaluation of the likelihood, to fit to serial interval data and infer parameters of the underlying model. We use simulated and real data to verify the accuracy of our methodology and illustrate the importance of accounting for household size. The output of our approach can be used to produce posterior distributions of population level epidemic characteristics.  相似文献   

18.
With the increasing use of survival models in animal breeding to address the genetic aspects of mainly longevity of livestock but also disease traits, the need for methods to infer genetic correlations and to do multivariate evaluations of survival traits and other types of traits has become increasingly important. In this study we derived and implemented a bivariate quantitative genetic model for a linear Gaussian and a survival trait that are genetically and environmentally correlated. For the survival trait, we considered the Weibull log-normal animal frailty model. A Bayesian approach using Gibbs sampling was adopted. Model parameters were inferred from their marginal posterior distributions. The required fully conditional posterior distributions were derived and issues on implementation are discussed. The two Weibull baseline parameters were updated jointly using a Metropolis-Hasting step. The remaining model parameters with non-normalized fully conditional distributions were updated univariately using adaptive rejection sampling. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation of data. In conclusion, the proposed method allows inferring additive genetic and environmental correlations, and doing multivariate genetic evaluation of a linear Gaussian trait and a survival trait.  相似文献   

19.
Research on endozoochorous seed dispersal is needed to further understand plant ecology and evolution. There are several methods for calculating the distribution of seed dispersal distances, although many studies use the “combination of gut retention time and movement data” (CGM) method to determine the potential seed dispersal distance distribution (PSD). However, there have been no evaluations of between PSD values acquired by CGM and seed dispersal distance distributions calculated using other methods. The main purpose of this study was to compare methods of determining seed dispersal distance distributions using raccoon dogs (Nyctereutes procyonoides). We calculated estimated seed dispersal distance distribution (ESD) using the bait-marker method and PSD using the CGM method. There were no differences between the ESD and PSD results with regard to basic dispersal distance distributions. The results indicate that if the region from which animal movement data was acquired and the region from which markers for the bait-marker method have been collected are the same, the distance distributions using the two methods may match. Additionally, though there were differences in seed mimic gut retention times (GRTs) between the two baits used (median GRT, fruits: 8 h 50 min, animal materials: 12 h 55 min), there were no differences in PSD between the two baits. This indicates that disperser movement has a stronger effect on dispersal distance distribution than GRT when using the CGM method.  相似文献   

20.
Much research has centered on determining which habitat model best predicts species occurrence. However, previous work typically used data sets that are inherently biased for evaluation. The use of simulated data provides a way of testing model performance using un‐biased data where the true relationships between species occurrence and population processes are predefined using sound ecological theory. We used a process‐based habitat model to generate simulated occurrence data to evaluate presence–absence and presence–only methods: generalized linear and generalized additive models (GLM, GAM), maximum entropy model (Maxent), and discrete choice models (DCM). This is the first study to use a DCM for predicting species distributions. We varied the effect that habitat quality had on fecundity and reported the model responses to these changes. When the effect of habitat quality on fecundity was weak, model performance was no better than random for all methods, however, performance increased as the habitat/fecundity relationship became stronger. For each level of habitat quality effect, there was little variation in performance between the presence–absence and presence–only methods. The use of a process‐based habitat model to generate occurrence data for evaluating model performance has a distinct advantage over other testing methods, because no errors are made during sampling and the true ecological relationships between population process and species occurrence are known. This leads to un‐biased results and increased confidence in assessing model performance and making management recommendations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号