共查询到20条相似文献,搜索用时 15 毫秒
1.
Perrin BE Ralaivola L Mazurie A Bottani S Mallet J d'Alché-Buc F 《Bioinformatics (Oxford, England)》2003,19(Z2):ii138-ii148
This article deals with the identification of gene regulatory networks from experimental data using a statistical machine learning approach. A stochastic model of gene interactions capable of handling missing variables is proposed. It can be described as a dynamic Bayesian network particularly well suited to tackle the stochastic nature of gene regulation and gene expression measurement. Parameters of the model are learned through a penalized likelihood maximization implemented through an extended version of EM algorithm. Our approach is tested against experimental data relative to the S.O.S. DNA Repair network of the Escherichia coli bacterium. It appears to be able to extract the main regulations between the genes involved in this network. An added missing variable is found to model the main protein of the network. Good prediction abilities on unlearned data are observed. These first results are very promising: they show the power of the learning algorithm and the ability of the model to capture gene interactions. 相似文献
2.
A statistical analysis of a weighted averaging procedure for the estimation of small signals buried in noise (Hoke et al. 1984a) is given. The weighting factor used by this method is in inverse proportion to the variance estimated for the noise. It is shown that, compred to conventional averaging, weighted averaging can improve the signal-to-noise ratio to a high extent if the variance of the noise changes as a function of time. On the other hand, uncritical application of the method involves the danger that the signal amplitude is underestimated. How serious this effect is depends on the number of degrees of freedom available for the estimation of the weighting factor. The effect can be neglected, if this number is sufficiently increased by means of an appropriate preprocessing. 相似文献
3.
Bayesian inference of recent migration rates using multilocus genotypes 总被引:25,自引:0,他引:25
A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci. 相似文献
4.
5.
Polytomies and Bayesian phylogenetic inference 总被引:16,自引:0,他引:16
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed. 相似文献
6.
Wang Y Rannala B 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1512):3921-3930
Recently, several statistical methods for estimating fine-scale recombination rates using population samples have been developed. However, currently available methods that can be applied to large-scale data are limited to approximated likelihoods. Here, we developed a full-likelihood Markov chain Monte Carlo method for estimating recombination rate under a Bayesian framework. Genealogies underlying a sampling of chromosomes are effectively modelled by using marginal individual single nucleotide polymorphism genealogies related through an ancestral recombination graph. The method is compared with two existing composite-likelihood methods using simulated data.Simulation studies show that our method performs well for different simulation scenarios. The method is applied to two human population genetic variation datasets that have been studied by sperm typing. Our results are consistent with the estimates from sperm crossover analysis. 相似文献
7.
8.
Inferring the minimum pH of streams from macroinvertebrates using weighted averaging regression and calibration 总被引:1,自引:0,他引:1
1. Data on macroinvertebrates and stream chemistry were collected from sixty-four streams in Finland. Weighted averaging (WA) regression and calibration models were constructed to infer the minimum pH of streams from their invertebrate assemblages. The purpose was to develop an instrument for biological assessment and monitoring of stream acidification. The WA method was compared with simpler approaches, based on qualitative invertebrate data and pH tolerance limits, that are widely used.
2. Performance of the two approaches was assessed in terms of correlation between the inferred and observed minimum pH within the 'training set', and in terms of root mean squared differences (predicted – observed) (RMSEP) estimated by cross-validation or bootstrap resampling techniques. The models were further tested using independent data from the literature representative of a wide geographical range.
3. The predictive power of the WA models was reasonable (RMSEP 0.40–0.44 pH units) in the training set and consistently better than that of the tolerance limit method. In contrast to the latter, the WA models were able to infer a minimum pH above 5.5, suggesting they could detect the early stages of acidification.
4. The WA models performed better than the tolerance limit method in inferring pH from the independent literature, further demonstrating the superiority and generality of the WA approach.
5. The weighted averaging technique could be an effective and widely applicable tool for contemporary biological monitoring and assessment using aquatic invertebrates. 相似文献
2. Performance of the two approaches was assessed in terms of correlation between the inferred and observed minimum pH within the 'training set', and in terms of root mean squared differences (predicted – observed) (RMSEP) estimated by cross-validation or bootstrap resampling techniques. The models were further tested using independent data from the literature representative of a wide geographical range.
3. The predictive power of the WA models was reasonable (RMSEP 0.40–0.44 pH units) in the training set and consistently better than that of the tolerance limit method. In contrast to the latter, the WA models were able to infer a minimum pH above 5.5, suggesting they could detect the early stages of acidification.
4. The WA models performed better than the tolerance limit method in inferring pH from the independent literature, further demonstrating the superiority and generality of the WA approach.
5. The weighted averaging technique could be an effective and widely applicable tool for contemporary biological monitoring and assessment using aquatic invertebrates. 相似文献
9.
HEIKKI HÄMÄLÄINEN & PERTTI HUTTUNEN 《The Plant journal : for cell and molecular biology》2003,36(3):697-709
1. Data on macroinvertebrates and stream chemistry were collected from sixty-four streams in Finland. Weighted averaging (WA) regression and calibration models were constructed to infer the minimum pH of streams from their invertebrate assemblages. The purpose was to develop an instrument for biological assessment and monitoring of stream acidification. The WA method was compared with simpler approaches, based on qualitative invertebrate data and pH tolerance limits, that are widely used.
2. Performance of the two approaches was assessed in terms of correlation between the inferred and observed minimum pH within the 'training set', and in terms of root mean squared differences (predicted – observed) (RMSEP) estimated by cross-validation or bootstrap resampling techniques. The models were further tested using independent data from the literature representative of a wide geographical range.
3. The predictive power of the WA models was reasonable (RMSEP 0.40–0.44 pH units) in the training set and consistently better than that of the tolerance limit method. In contrast to the latter, the WA models were able to infer a minimum pH above 5.5, suggesting they could detect the early stages of acidification.
4. The WA models performed better than the tolerance limit method in inferring pH from the independent literature, further demonstrating the superiority and generality of the WA approach.
5. The weighted averaging technique could be an effective and widely applicable tool for contemporary biological monitoring and assessment using aquatic invertebrates. 相似文献
2. Performance of the two approaches was assessed in terms of correlation between the inferred and observed minimum pH within the 'training set', and in terms of root mean squared differences (predicted – observed) (RMSEP) estimated by cross-validation or bootstrap resampling techniques. The models were further tested using independent data from the literature representative of a wide geographical range.
3. The predictive power of the WA models was reasonable (RMSEP 0.40–0.44 pH units) in the training set and consistently better than that of the tolerance limit method. In contrast to the latter, the WA models were able to infer a minimum pH above 5.5, suggesting they could detect the early stages of acidification.
4. The WA models performed better than the tolerance limit method in inferring pH from the independent literature, further demonstrating the superiority and generality of the WA approach.
5. The weighted averaging technique could be an effective and widely applicable tool for contemporary biological monitoring and assessment using aquatic invertebrates. 相似文献
10.
Background
Information for mapping of quantitative trait loci (QTL) comes from two sources: linkage disequilibrium (non-random association of allele states) and cosegregation (non-random association of allele origin). Information from LD can be captured by modeling conditional means and variances at the QTL given marker information. Similarly, information from cosegregation can be captured by modeling conditional covariances. Here, we consider a Bayesian model based on gene frequency (BGF) where both conditional means and variances are modeled as a function of the conditional gene frequencies at the QTL. The parameters in this model include these gene frequencies, additive effect of the QTL, its location, and the residual variance. Bayesian methodology was used to estimate these parameters. The priors used were: logit-normal for gene frequencies, normal for the additive effect, uniform for location, and inverse chi-square for the residual variance. Computer simulation was used to compare the power to detect and accuracy to map QTL by this method with those from least squares analysis using a regression model (LSR).Results
To simplify the analysis, data from unrelated individuals in a purebred population were simulated, where only LD information contributes to map the QTL. LD was simulated in a chromosomal segment of 1 cM with one QTL by random mating in a population of size 500 for 1000 generations and in a population of size 100 for 50 generations. The comparison was studied under a range of conditions, which included SNP density of 0.1, 0.05 or 0.02 cM, sample size of 500 or 1000, and phenotypic variance explained by QTL of 2 or 5%. Both 1 and 2-SNP models were considered. Power to detect the QTL for the BGF, ranged from 0.4 to 0.99, and close or equal to the power of the regression using least squares (LSR). Precision to map QTL position of BGF, quantified by the mean absolute error, ranged from 0.11 to 0.21 cM for BGF, and was better than the precision of LSR, which ranged from 0.12 to 0.25 cM.Conclusions
In conclusion given a high SNP density, the gene frequency model can be used to map QTL with considerable accuracy even within a 1 cM region. 相似文献11.
Ronquist F 《Trends in ecology & evolution》2004,19(9):475-481
Much recent progress in evolutionary biology is based on the inference of ancestral states and past transformations in important traits on phylogenetic trees. These exercises often assume that the tree is known without error and that ancestral states and character change can be mapped onto it exactly. In reality, there is often considerable uncertainty about both the tree and the character mapping. Recently introduced Bayesian statistical methods enable the study of character evolution while simultaneously accounting for both phylogenetic and mapping uncertainty, adding much needed credibility to the reconstruction of evolutionary history. 相似文献
12.
Background
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. 相似文献13.
Micha? Komorowski B?rbel Finkenst?dt Claire V Harper David A Rand 《BMC bioinformatics》2009,10(1):343
Background
Fluorescent and luminescent gene reporters allow us to dynamically quantify changes in molecular species concentration over time on the single cell level. The mathematical modeling of their interaction through multivariate dynamical models requires the deveopment of effective statistical methods to calibrate such models against available data. Given the prevalence of stochasticity and noise in biochemical systems inference for stochastic models is of special interest. In this paper we present a simple and computationally efficient algorithm for the estimation of biochemical kinetic parameters from gene reporter data. 相似文献14.
Background
The minute, finely-tuned ear ossicles of mammals arose through a spectacular evolutionary transformation from their origins as a load-bearing jaw joint. This involved detachment from the postdentary trough of the mandible, and final separation from the dentary through resorption of Meckel’s cartilage. Recent parsimony analyses of modern and fossil mammals imply up to seven independent postdentary trough losses or even reversals, which is unexpected given the complexity of these transformations. Here we employ the first model-based, probabilistic analysis of the evolution of the definitive mammalian middle ear, supported by virtual 3D erosion simulations to assess for potential fossil preservation artifacts.Results
Our results support a simple, biologically plausible scenario without reversals. The middle ear bones detach from the postdentary trough only twice among mammals, once each in the ancestors of therians and monotremes. Disappearance of Meckel’s cartilage occurred independently in numerous lineages from the Late Jurassic to the Late Cretaceous. This final separation is recapitulated during early development of extant mammals, while the earlier-occurring disappearance of a postdentary trough is not.Conclusions
Our results therefore suggest a developmentally congruent and directional two-step scenario, in which the parallel uncoupling of the auditory and feeding systems in northern and southern hemisphere mammals underpinned further specialization in both lineages. Until ~168 Ma, all known mammals retained attached middle ear bones, yet all groups that diversified from ~163 Ma onwards had lost the postdentary trough, emphasizing the adaptive significance of this transformation.15.
In time-resolved spectroscopy, composite signal sequences representing energy transfer in fluorescence materials are measured, and the physical characteristics of the materials are analyzed. Each signal sequence is represented by a sum of non-negative signal components, which are expressed by model functions. For analyzing the physical characteristics of a measured signal sequence, the parameters of the model functions are estimated. Furthermore, in order to quantitatively analyze real measurement data and to reduce the risk of improper decisions, it is necessary to obtain the statistical characteristics from several sequences rather than just a single sequence. In the present paper, we propose an automatic method by which to analyze composite signals using non-negative factorization and an information criterion. The proposed method decomposes the composite signal sequences using non-negative factorization subjected to parametric base functions. The number of components (i.e., rank) is also estimated using Akaike's information criterion. Experiments using simulated and real data reveal that the proposed method automatically estimates the acceptable ranks and parameters. 相似文献
16.
Recently, the use of the Bayesian network as an alternative to existing tools for similarity-based virtual screening has received noticeable attention from researchers in the chemoinformatics field. The main aim of the Bayesian network model is to improve the retrieval effectiveness of similarity-based virtual screening. To this end, different models of the Bayesian network have been developed. In our previous works, the retrieval performance of the Bayesian network was observed to improve significantly when multiple reference structures or fragment weightings were used. In this article, the authors enhance the Bayesian inference network (BIN) using the relevance feedback information. In this approach, a few high-ranking structures of unknown activity were filtered from the outputs of BIN, based on a single active reference structure, to form a set of active reference structures. This set of active reference structures was used in two distinct techniques for carrying out such BIN searching: reweighting the fragments in the reference structures and group fusion techniques. Simulated virtual screening experiments with three MDL Drug Data Report data sets showed that the proposed techniques provide simple ways of enhancing the cost-effectiveness of ligand-based virtual screening searches, especially for higher diversity data sets. 相似文献
17.
Alessandro Haiduck Padilha Aroni Sattler Jaime Araújo Cobuci Concepta Margaret McManus 《Genetics and molecular biology》2013,36(2):207-213
Heritability and genetic correlations for honey (HP) and propolis production (PP), hygienic behavior (HB), syrup-collection rate (SCR) and percentage of mites on adult bees (PMAB) of a population of Africanized honeybees were estimated. Data from 110 queen bees over three generations were evaluated. Single and multi-trait models were analyzed by Bayesian Inference using MTGSAM. The localization of the hive was significant for SCR and HB and highly significant for PP. Season-year was highly significant only for SCR. The number of frames with bees was significant for HP and PP, including SCR. The heritability estimates were 0.16 for HP, 0.23 for SCR, 0.52 for HB, 0.66 for PP, and 0.13 for PMAB. The genetic correlations were positive among productive traits (PP, HP and SCR) and negative between productive traits and HB, except between PP and HB. Genetic correlations between PMAB and other traits, in general, were negative, except with PP. The study permitted to identify honeybees for improved propolis and honey production. Hygienic behavior may be improved as a consequence of selecting for improved propolis production. The rate of syrup consumption and propolis production may be included in a selection index to enhance honeybee traits. 相似文献
18.
Mann RP Perna A Strömbom D Garnett R Herbert-Read JE Sumpter DJ Ward AJ 《PLoS computational biology》2012,8(1):e1002308
Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture fine scale rules of interaction, which are primarily mediated by physical contact. Conversely, the Markovian self-propelled particle model captures the fine scale rules of interaction but fails to reproduce global dynamics. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects. 相似文献
19.
20.
Raphael B. Costa Gregório MF Camargo Iara DPS Diaz Natalia Irano Marina M. Dias Roberto Carvalheiro Arione A. Boligon Fernando Baldi Henrique N. Oliveira Humberto Tonhati Lucia G. Albuquerque 《遗传、选种与进化》2015,47(1)