共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Introduction
With the renewed drive towards malaria elimination, there is a need for improved surveillance tools. While time series analysis is an important tool for surveillance, prediction and for measuring interventions’ impact, approximations by commonly used Gaussian methods are prone to inaccuracies when case counts are low. Therefore, statistical methods appropriate for count data are required, especially during “consolidation” and “pre-elimination” phases.Methods
Generalized autoregressive moving average (GARMA) models were extended to generalized seasonal autoregressive integrated moving average (GSARIMA) models for parsimonious observation-driven modelling of non Gaussian, non stationary and/or seasonal time series of count data. The models were applied to monthly malaria case time series in a district in Sri Lanka, where malaria has decreased dramatically in recent years.Results
The malaria series showed long-term changes in the mean, unstable variance and seasonality. After fitting negative-binomial Bayesian models, both a GSARIMA and a GARIMA deterministic seasonality model were selected based on different criteria. Posterior predictive distributions indicated that negative-binomial models provided better predictions than Gaussian models, especially when counts were low. The G(S)ARIMA models were able to capture the autocorrelation in the series.Conclusions
G(S)ARIMA models may be particularly useful in the drive towards malaria elimination, since episode count series are often seasonal and non-stationary, especially when control is increased. Although building and fitting GSARIMA models is laborious, they may provide more realistic prediction distributions than do Gaussian methods and may be more suitable when counts are low. 相似文献3.
Chiara Polce Mette Termansen Jesus Aguirre-Gutiérrez Nigel D. Boatman Giles E. Budge Andrew Crowe Michael P. Garratt Stéphane Pietravalle Simon G. Potts Jorge A. Ramirez Kate E. Somerwill Jacobus C. Biesmeijer 《PloS one》2013,8(10)
Insect pollination benefits over three quarters of the world''s major crops. There is growing concern that observed declines in pollinators may impact on production and revenues from animal pollinated crops. Knowing the distribution of pollinators is therefore crucial for estimating their availability to pollinate crops; however, in general, we have an incomplete knowledge of where these pollinators occur. We propose a method to predict geographical patterns of pollination service to crops, novel in two elements: the use of pollinator records rather than expert knowledge to predict pollinator occurrence, and the inclusion of the managed pollinator supply. We integrated a maximum entropy species distribution model (SDM) with an existing pollination service model (PSM) to derive the availability of pollinators for crop pollination. We used nation-wide records of wild and managed pollinators (honey bees) as well as agricultural data from Great Britain. We first calibrated the SDM on a representative sample of bee and hoverfly crop pollinator species, evaluating the effects of different settings on model performance and on its capacity to identify the most important predictors. The importance of the different predictors was better resolved by SDM derived from simpler functions, with consistent results for bees and hoverflies. We then used the species distributions from the calibrated model to predict pollination service of wild and managed pollinators, using field beans as a test case. The PSM allowed us to spatially characterize the contribution of wild and managed pollinators and also identify areas potentially vulnerable to low pollination service provision, which can help direct local scale interventions. This approach can be extended to investigate geographical mismatches between crop pollination demand and the availability of pollinators, resulting from environmental change or policy scenarios. 相似文献
4.
5.
Paul Scholefield Les Firbank Simon Butler Ken Norris Laurence M. Jones Sandrine Petit 《Ecological Indicators》2011,11(1):46-51
The European Farmland Bird Indicator (EFBI) has been adopted as a Structural and Sustainable Development Indicator by the European Union. It is an aggregated index integrating the population trends of 33 common bird species associated with farmland habitats across 21 countries. We describe a modelling method for predicting this indicator from land-use characteristics. Using yearly historical land-use data of crop areas derived from the FAO databases (1990–2007) and published population data of farmland birds at the national level for the same period, we developed a series of multiple regression models to predict the trend of the EU state specific indicator, and the EFBI. These models incorporated up to 4 parameters and were selected based upon the significance (p < 0.05) of the model inputs with respect to the predictive variable. 17 separate models were developed in total for each of 14 EU countries plus Norway and Switzerland, and a separate model for the EU level indicator. The selected models were then implemented to predict the EFBI in the year 2025, using scenarios of land-use change generated by the CAPRI agricultural model. The uncertainty of using the regression models is discussed with respect to predicting the likely impacts of land-use change on bird populations. This work lays the framework for future modelling of farmland birds at the international scale. 相似文献
6.
7.
The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approximations and make our method scalable in the number of loci. The general algorithm presented here applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model. Empirical results are provided to demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations.THE probability of observing a sample of DNA sequences under a given population genetics model—which is referred to as the sampling probability or likelihood—plays an important role in a wide range of problems in a genetic variation study. When recombination is involved, however, obtaining an analytic formula for the sampling probability has hitherto remained a challenging open problem (see Jenkins and Song 2009, 2010 for recent progress on this problem). As such, much research (Griffiths and Marjoram 1996; Kuhner et al. 2000; Nielsen 2000; Stephens and Donnelly 2000; Fearnhead and Donnelly 2001; De Iorio and Griffiths 2004a,b; Fearnhead and Smith 2005; Griffiths et al. 2008; Wang and Rannala 2008) has focused on developing Monte Carlo methods on the basis of the coalescent with recombination (Griffiths 1981; Kingman 1982a,b; Hudson 1983), a well-established mathematical framework that models the genealogical history of sample chromosomes. These Monte Carlo-based full-likelihood methods mark an important development in population genetics analysis, but a well-known obstacle to their utility is that they tend to be computationally intensive. For a whole-genome variation study, approximations are often unavoidable, and it is therefore important to think of ways to minimize the trade-off between scalability and accuracy.A popular likelihood-based approximation method that has had a significant impact on population genetics analysis is the following approach introduced by Li and Stephens (2003): Given a set Φ of model parameters (e.g., mutation rate, recombination rate, etc.), the joint probability p(h1, … , hn | Φ) of observing a set {h1, … , hn} of haplotypes sampled from a population can be decomposed as a product of conditional sampling distributions (CSDs), denoted by π,(1)where π(hk+1|h1, …, hk, Φ) is the probability of an additionally sampled haplotype being of type hk+1, given a set of already observed haplotypes h1, …, hk. In the presence of recombination, the true CSD π is unknown, so Li and Stephens proposed using an approximate CSD in place of π, thus obtaining the following approximation of the joint probability:(2)Li and Stephens referred to this approximation as the product of approximate conditionals (PAC) model. In general, the closer is to the true CSD π, the more accurate the PAC model becomes. Notable applications and extensions of this framework include estimating crossover rates (Li and Stephens 2003; Crawford et al. 2004) and gene conversion parameters (Gay et al. 2007; Yin et al. 2009), phasing genotype data into haplotype data (Stephens and Scheet 2005; Scheet and Stephens 2006), imputing missing data to improve power in association mapping (Stephens and Scheet 2005; Li and Abecasis 2006; Marchini et al. 2007; Howie et al. 2009), inferring local ancestry in admixed populations (Price et al. 2009), inferring human colonization history (Hellenthal et al. 2008), inferring demography (Davison et al. 2009), and so on.Another problem in which the CSD plays a fundamental role is importance sampling of genealogies under the coalescent process (Stephens and Donnelly 2000; Fearnhead and Donnelly 2001; De Iorio and Griffiths 2004a,b; Fearnhead and Smith 2005; Griffiths et al. 2008). In this context, the optimal proposal distribution can be written in terms of the CSD π (Stephens and Donnelly 2000), and as in the PAC model, an approximate CSD may be used in place of π. The performance of an importance sampling scheme depends critically on the proposal distribution and therefore on the accuracy of the approximation . Often in conjunction with composite-likelihood frameworks (Hudson 2001; Fearnhead and Donnelly 2002), importance sampling has been used in estimating fine-scale recombination rates (McVean et al. 2004; Fearnhead and Smith 2005; Johnson and Slatkin 2009).So far, a significant scope of intuition has gone into choosing the approximate CSDs used in these problems (Marjoram and Tavaré 2006). In the case of completely linked loci, Stephens and Donnelly (2000) suggested constructing an approximation by assuming that the additional haplotype hk+1 is an imperfect copy of one of the first k haplotypes, with copying errors corresponding to mutation. Fearnhead and Donnelly (2001) generalized this construction to include crossover recombination, assuming that the haplotype hk+1 is an imperfect mosaic of the first k haplotypes (i.e., hk+1 is obtained by copying segments from h1, …, hk, where crossover recombination can change the haplotype from which copying is performed). The associated CSD, which we denote by , can be interpreted as a hidden Markov model and so admits an efficient dynamic programming solution. Finally, Li and Stephens (2003) proposed a modification to Fearnhead and Donnelly''s model that limits the hidden state space, thereby providing a computational simplification; we denote the corresponding approximate CSD by .Although these approaches are computationally appealing, it is important to note that they are not derived from, though are certainly motivated by, principles underlying typical population genetics models, in particular the coalescent process (Griffiths 1981; Kingman 1982a,b; Hudson 1983). The main objective of this article is to develop a principled technique to derive an improved CSD directly from the underlying population genetics model. Rather than relying on intuition, we base our work on mathematical foundation. The theoretical framework we employ is the diffusion process. De Iorio and Griffiths (2004a,b) first introduced the diffusion-generator approximation technique to obtain an approximate CSD in the case of a single locus (i.e., no recombination). Griffiths et al. (2008) later extended the approach to two loci to include crossover recombination, assuming a parent-independent mutation model at each locus. In this article, we extend the framework to develop a general algorithm that applies to an arbitrary number of loci and an arbitrary finite-alleles recurrent mutation model.Our work can be summarized as follows. Using the diffusion-generator approximation technique, we derive a recursion relation satisfied by an approximate CSD. This recursion can be used to construct a closed system of coupled linear equations, in which the conditional sampling probability of interest appears as one of the unknown variables. The system of equations can be solved using standard numerical analysis techniques. However, the size of the system grows superexponentially with the number of loci and, consequently, so does the running time. To remedy this drawback, we introduce additional approximations to make our approach scalable in the number of loci. Specifically, the recursion admits an intuitive genealogical interpretation, and, on the basis of this interpretation, we propose modifications to the recursion, which then can be easily solved using dynamic programming. The computational complexity of the modified algorithm is polynomial in the number of loci, and, importantly, the resulting CSD has little loss of accuracy compared to that following from the full recursion.The accuracy of approximate CSDs has not been discussed much in the literature, except in the application-specific context for which they are being employed. In this article, we carry out an empirical study to explicitly test the accuracy of various CSDs and demonstrate that our new CSDs are in general substantially more accurate than previously proposed approximations. We also consider the PAC framework and show that our approximations also produce more accurate PAC-likelihood estimates. We note that for the maximum-likelihood estimation of recombination rates, the actual value of the likelihood may not be so important, as long as it is maximized near the true recombination rate. However, in many other applications—e.g., phasing genotype data into haplotype data, imputing missing data, importance sampling, and so on—the accuracy of the CSD and PAC-likelihood function over a wide range of parameter values may be important. Thus, we believe that the theoretical work presented here will have several practical implications; our method can be applied in a wide range of statistical tools that use CSDs, improving their accuracy.The remainder of this article is organized as follows. To provide intuition for the ensuing mathematics, we first describe a genealogical process that gives rise to our CSD. Using our genealogical interpretation, we consider two additional approximations and relate these to previously proposed CSDs. Then, in the following section, we derive our CSD using the diffusion-generator approach and provide mathematical statements for the additional approximations; some interesting limiting behavior is also described there. This section is self-contained and may be skipped by the reader uninterested in mathematical details. Finally, in the subsequent section, we carry out a simulation study to compare the accuracy of various approximate CSDs and demonstrate that ours are generally the most accurate. 相似文献
8.
9.
10.
Héctor García Martín Vinay Satish Kumar Daniel Weaver Amit Ghosh Victor Chubukov Aindrila Mukhopadhyay Adam Arkin Jay D. Keasling 《PLoS computational biology》2015,11(9)
Current limitations in quantitatively predicting biological behavior hinder our efforts to engineer biological systems to produce biofuels and other desired chemicals. Here, we present a new method for calculating metabolic fluxes, key targets in metabolic engineering, that incorporates data from 13C labeling experiments and genome-scale models. The data from 13C labeling experiments provide strong flux constraints that eliminate the need to assume an evolutionary optimization principle such as the growth rate optimization assumption used in Flux Balance Analysis (FBA). This effective constraining is achieved by making the simple but biologically relevant assumption that flux flows from core to peripheral metabolism and does not flow back. The new method is significantly more robust than FBA with respect to errors in genome-scale model reconstruction. Furthermore, it can provide a comprehensive picture of metabolite balancing and predictions for unmeasured extracellular fluxes as constrained by 13C labeling data. A comparison shows that the results of this new method are similar to those found through 13C Metabolic Flux Analysis (13C MFA) for central carbon metabolism but, additionally, it provides flux estimates for peripheral metabolism. The extra validation gained by matching 48 relative labeling measurements is used to identify where and why several existing COnstraint Based Reconstruction and Analysis (COBRA) flux prediction algorithms fail. We demonstrate how to use this knowledge to refine these methods and improve their predictive capabilities. This method provides a reliable base upon which to improve the design of biological systems. 相似文献
11.
Nikolaos Sfakianakis Niklas Kolbe Nadja Hellmann Mária Lukáčová-Medvid’ová 《Bulletin of mathematical biology》2017,79(1):209-235
We propose a multiscale model for the invasion of the extracellular matrix by two types of cancer cells, the differentiated cancer cells and the cancer stem cells. We investigate the epithelial mesenchymal-like transition between them being driven primarily by the epidermal growth factors. We moreover take into account the transdifferentiation program of the cancer stem cells towards the cancer-associated fibroblast cells as well as the fibroblast-driven remodelling of the extracellular matrix. The proposed haptotaxis model combines the macroscopic phenomenon of the invasion of the extracellular matrix by both types of cancer cells with the microscopic dynamics of the epidermal growth factors. We analyse our model in a component-wise manner and compare our findings with the literature. We investigate pathological situations regarding the epidermal growth factors and accordingly propose “mathematical-treatment” scenarios to control the aggressiveness of the tumour. 相似文献
12.
13.
Species distribution models (SDMs) are widespread in ecology and conservation biology, but their accuracy can be lowered by non-environmental (noisy) absences that are common in species occurrence data. Here we propose an iterative ensemble modelling (IEM) method to deal with noisy absences and hence improve the predictive reliability of ensemble modelling of species distributions. In the IEM approach, outputs of a classical ensemble model (EM) were used to update the raw occurrence data. The revised data was then used as input for a new EM run. This process was iterated until the predictions stabilized. The outputs of the iterative method were compared to those of the classical EM using virtual species. The IEM process tended to converge rapidly. It increased the consensus between predictions provided by the different methods as well as between those provided by different learning data sets. Comparing IEM and EM showed that for high levels of non-environmental absences, iterations significantly increased prediction reliability measured by the Kappa and TSS indices, as well as the percentage of well-predicted sites. Compared to EM, IEM also reduced biases in estimates of species prevalence. Compared to the classical EM method, IEM improves the reliability of species predictions. It particularly deals with noisy absences that are replaced in the data matrices by simulated presences during the iterative modelling process. IEM thus constitutes a promising way to increase the accuracy of EM predictions of difficult-to-detect species, as well as of species that are not in equilibrium with their environment. 相似文献
14.
In this paper, we present a model of cell cycle progression and apply it to cells of the MCF-7 breast cancer cell line. We
consider cells existing in the three typical cell cycle phases determined using flow cytometry: the G1, S, and G2/M phases. We further break each phase up into model phases in order to capture certain features such as cells remaining in
phases for a minimum amount of time. The model is also able to capture the environmentally responsive part of the G1 phase, allowing for quantification of the number of environmentally responsive cells at each point in time. The model parameters
are carefully chosen using data from various sources in the biological literature. The model is then validated against a variety
of experiments, and the excellent fit with experimental results allows for insight into the mechanisms that influence observed
biological phenomena. In particular, the model is used to question the common assumption that a ‘slow cycling population’
is necessary to explain some results. Finally, an extension is proposed, where cell death is included in order to accurately
model the effects of tamoxifen, a common first line anticancer drug in breast cancer patients. We conclude that the model
has strong potential to be used as an aid in future experiments to gain further insight into cell cycle progression and cell
death. 相似文献
15.
《Cell cycle (Georgetown, Tex.)》2013,12(6):822-830
v-ErbB is an oncogene related to the Epidermal Growth Factor Receptor (EGFR). EGFR overexpression has been observed in many pathological situations. There is a truncated form of EGFR, referred to as EGFvIII, which resembles v-ErbB in biological properties and is often expressed in certain human tumors. Aberrant EGFR expression in human cancers is often constitutive and may occur in the presence of mutated oncogenes or tumor suppressor genes. To circumvent these problems, we subcloned v-ErbB into a vector which contains the estrogen receptor hormone binding domain (ER) which renders the v-ErbB:ER protein dependent upon ?-estradiol for activity. v-ErbB:ER conditionally abrogated the cytokine dependence of hematopoietic cells more efficiently than activated v-Ha-Ras, v-Src, Raf or Akt. Abrogation of cytokine-dependence by v-ErbB:ER was not due to the synthesis of autocrine growth factors. Treatment of v-ErbB:ER cells with the EGFR inhibitor AG1478 efficiently induced apoptosis. Induction of apoptosis and prevention of cell cycle progression by the EGFR inhibitor were only observed when the cells were grown in response to v-ErbB:ER activation demonstrating specificity. In contrast, the other inhibitors suppressed cell cycle progression when the cells were grown in response to v-ErbB:ER or the cytokine interleukin-3. When MEK and either EGFR or PI3K/mTOR inhibitors were added, an enhanced apoptotic response was observed. Thus this conditional ErbB construct is useful to elucidate EGFR signaling and anti-apoptotic pathways in the absence of autocrine cytokine expression. 相似文献
16.
This paper reports the outcomes of a novel inter-twining of long-term monitoring data and population modelling to assess the
accuracy of predictions of a Population Viability Analysis (PVA) model. In particular, the relative effectiveness of different
management options for reserving areas from timber harvesting was assessed for two forest-dependent arboreal marsupials, the
greater glider (Petauroides volans) and Leadbeater's Possum (Gymnobelideus leadbeateri). We used data from 7 years of monitoring conducted at 161 sites to assess and modify, where appropriate, previous population
models of the two species of arboreal marsupials. The results indicated that the importance of food resources for both had
been underestimated in past work. Despite this, the modified models that included increased importance of food availability
did not change the predicted risks of decline substantially, particularly for Leadbeater's Possum. Importantly, past conclusions
about the optimal sizes of patches to reserve within the forests used wood production were robust to changes in the model.
This is a valuable finding because the work we report is one of the first to empirically test the robustness of the relative
predictions of a PVA model. Nevertheless, the new insights we derived from this study have implications both for: (1) the
implementation of our ongoing long-term monitoring study, in particular the value of what can be termed ȁ8adaptive monitoring',
and, (2) the establishment of a new silvicultural experiment designed to better create habitat for arboreal marsupials within
logged and regenerated sites. 相似文献
17.
Mi Kyung Kwak Daniel T. Johnson Chunfang Zhu Suk Hyung Lee Ding-Wei Ye Richard Luong Zijie Sun 《PloS one》2013,8(1)
The PTEN tumor suppressor gene is frequently inactivated in human prostate cancer. Using Osr1 (odd skipped related 1)-Cre mice, we generated a novel conditional Pten knockout mouse strain, PtenLoxP:Osr1-Cre. Conditional biallelic and monoallelic Pten knockout mice were viable. Deletion of Pten expression was detected in the prostate of PtenLoxP/LoxP:Osr1-Cre mice as early as 2 weeks of age. Intriguingly, PtenLoxP/LoxP:Osr1-Cre mice develop high-grade prostatic intraepithelial neoplasms (PINs) with high penetrance as early as one-month of age, and locally invasive prostatic tumors after 12-months of age. PtenLoxP/+:Osr1-Cre mice show only mild oncogenic changes after 8-weeks of age. Castration of PtenLoxP/LoxP:Osr1-Cre mice shows no significant regression of prostate tumors, although a shift of androgen receptor (AR) staining from the nuclei to cytoplasm is observed in Pten null tumor cells of castrated mice. Enhanced Akt activity is observed in Pten null tumor cells of castrated PtenLoxP/LoxP:Osr1-Cre. This study provides a novel mouse model that can be used to investigate a primary role of Pten in initiating oncogenic transformation in the prostate and to examine other genetic and epigenetic changes that are required for tumor progression in the mouse prostate. 相似文献
18.
André Yoshiaki Kashiwabara ígor Bonadio Vitor Onuchic Felipe Amado Rafael Mathias Alan Mitchell Durham 《PLoS computational biology》2013,9(10)
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.
This is a PLOS Computational Biology Software Article.相似文献
19.
D. G. Bonett P. M. Bentler J. A. Woodward 《Biometrical journal. Biometrische Zeitschrift》1986,28(6):759-762
The asymptotic covariance matrix of the maximum likelihood estimator for the log-linear model is given for a general class of conditional Poisson distributions which include the unconditional Poisson, multinomial and product-multinomial, as special cases. The general conditions are given under which the maximum likelihood covariance matrix is equal to the covariance matrix of an equivalent closed-form weighted least squares estimator. 相似文献