首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.  相似文献   

2.
3.
V-Xtractor (http://www.cmde.science.ubc.ca/mohn/software.html) uses Hidden Markov Models to locate, verify, and extract defined hypervariable sequence segments (V1-V9) from bacterial, archaeal, and fungal small-subunit rRNA sequences. With a detection efficiency of 99.6% and low susceptibility to false-positives, this tool refines data reliability and facilitates subsequent analysis in community assays.  相似文献   

4.
Modeling recurrent DNA copy number alterations in array CGH data   总被引:1,自引:0,他引:1  
MOTIVATION: Recurrent DNA copy number alterations (CNA) measured with array comparative genomic hybridization (aCGH) reveal important molecular features of human genetics and disease. Studying aCGH profiles from a phenotypic group of individuals can determine important recurrent CNA patterns that suggest a strong correlation to the phenotype. Computational approaches to detecting recurrent CNAs from a set of aCGH experiments have typically relied on discretizing the noisy log ratios and subsequently inferring patterns. We demonstrate that this can have the effect of filtering out important signals present in the raw data. In this article we develop statistical models that jointly infer CNA patterns and the discrete labels by borrowing statistical strength across samples. RESULTS: We propose extending single sample aCGH HMMs to the multiple sample case in order to infer shared CNAs. We model recurrent CNAs as a profile encoded by a master sequence of states that generates the samples. We show how to improve on two basic models by performing joint inference of the discrete labels and providing sparsity in the output. We demonstrate on synthetic ground truth data and real data from lung cancer cell lines how these two important features of our model improve results over baseline models. We include standard quantitative metrics and a qualitative assessment on which to base our conclusions. AVAILABILITY: http://www.cs.ubc.ca/~sshah/acgh.  相似文献   

5.
6.
BEST implements a Bayesian hierarchical model to jointly estimate gene trees and the species tree from multilocus sequences. It provides a new option for estimating species phylogenies within the popular Bayesian phylogenetic program MrBayes. The technique of simulated annealing is adopted along with Metropolis coupling as performed in MrBayes to improve the convergence rate of the Markov Chain Monte Carlo algorithm. AVAILABILITY: http://www.stat.osu.edu/~dkp/BEST.  相似文献   

7.
MOTIVATION: Microarray technology enables large-scale inference of the participation of genes in biological process from similar expression profiles. Our aim is to induce classificatory models from expression data and biological knowledge that can automatically associate genes with novel hypotheses of biological process. RESULTS: We report a systematic supervised learning approach to predicting biological process from time series of gene expression data and biological knowledge. Biological knowledge is expressed using gene ontology and this knowledge is associated with discriminatory expression-based features to form minimal decision rules. The resulting rule model is first evaluated on genes coding for proteins with known biological process roles using cross validation. Then it is used to generate hypotheses for genes for which no knowledge of participation in biological process could be found. The theoretical foundation for the methodology based on rough sets is outlined in the paper, and its practical application demonstrated on a data set previously published by Cho et al. (Nat. Genet., 27, 48-54, 2001). AVAILABILITY: The Rosetta system is available at http://www.idi.ntnu.no/~aleks/rosetta. SUPPLEMENTARY INFORMATION: http://www.lcb.uu.se/~hvidsten/bioinf_cho/  相似文献   

8.
SUMMARY: Differential Identification using Mixtures Ensemble (DIME) is a package for identification of biologically significant differential binding sites between two conditions using ChIP-seq data. It considers a collection of finite mixture models combined with a false discovery rate (FDR) criterion to find statistically significant regions. This leads to a more reliable assessment of differential binding sites based on a statistical approach. In addition to ChIP-seq, DIME is also applicable to data from other high-throughput platforms. Availability and implementation: DIME is implemented as an R-package, which is available at http://www.stat.osu.edu/~statgen/SOFTWARE/DIME. It may also be downloaded from http://cran.r-project.org/web/packages/DIME/.  相似文献   

9.
10.
In this paper, we describe an algorithm which can be used to generate the collection of networks, in order of increasing size, that are compatible with a list of chemical reactions and that satisfy a constraint. Our algorithm has been encoded and the software, called Netscan, can be freely downloaded from ftp://ftp.stat.ubc.ca/pub/riffraff/Netscanfiles, along with a manual, for general scientific use. Our algorithm may require pre-processing to ensure that the quantities it acts on are physically relevant and because it outputs sets of reactions, which we call canonical networks, that must be elaborated into full networks.  相似文献   

11.
Introduction: Application of systems biology/systems medicine approaches is promising for proteomics/biomedical research, but requires selection of an adequate modeling type.

Areas covered: This article reviews the existing Boolean network modeling approaches, which provide in comparison with alternative modeling techniques several advantages for the processing of proteomics data. Application of methods for inference, reduction and validation of protein co-expression networks that are derived from quantitative high-throughput proteomics measurements is presented. It’s also shown how Boolean models can be used to derive system-theoretic characteristics that describe both the dynamical behavior of such networks as a whole and the properties of different cell states (e.g. healthy or diseased cell states). Furthermore, application of methods derived from control theory is proposed in order to simulate the effects of therapeutic interventions on such networks, which is a promising approach for the computer-assisted discovery of biomarkers and drug targets. Finally, the clinical application of Boolean modeling analyses is discussed.

Expert commentary: Boolean modeling of proteomics data is still in its infancy. Progress in this field strongly depends on provision of a repository with public access to relevant reference models. Also required are community supported standards that facilitate input of both proteomics and patient related data (e.g. age, gender, laboratory results, etc.).  相似文献   


12.
Context: Osteoporosis (OP) is a progressive systemic bone disease. Dual-energy X-ray absorptiometry (DXA) is routinely employed and is considered the gold standard method for the diagnosis of OP.

Objective: We aimed to investigate the potential use of combined information from multiple bone turnover markers (BTMs) as a clinical diagnostic tool for OP.

Materials and methods: A total of 9053 Chinese postmenopausal women (2464 primary OP patients and 6589 healthy controls) were recruited. Serum levels of six common BTMs, including BAP, BSP, CTX, OPG, OST and sRANKL were assayed. Models based on support vector machine (SVM) were constructed to explore the efficiency of different combinations of multiple BTMs for OP diagnosis.

Results: Increasing the number of BTMs used in generating the models increased the predictive power of the SVM models for determining the disease status of study subjects. The highest kappa coefficient for the model with one BTM (BAP) compared to DXA was 0.7783. The full model incorporating all six BTMs resulted in a high kappa coefficient of 0.9786.

Conclusion: Our findings showed that although single BTMs were not sufficient for OP diagnosis, appropriate combinations of multiple BTMs incorporated into the SVM models showed almost perfect agreement with the DXA.  相似文献   


13.
DataBiNS is a custom-designed BioMoby Web Service workflow that integrates non-synonymous coding single nucleotide polymorphisms (nsSNPs) data with structure/function and pathway data for the relevant protein. A KEGG Pathway Identifier representing a specific human biological pathway initializes the DataBiNS workflow. The workflow retrieves a list of publications, gene ontology annotations and nsSNP information for each gene involved in the biological pathway. Manual inspection of output data from several trial runs confirms that all expected information is appropriately retrieved by the workflow services. The use of an automated BioMoby workflow, rather than manual 'surfing', to retrieve the necessary data, significantly reduces the effort required for functional interpretation of SNP data, and thus encourages more speculative investigation. Moreover, the modular nature of the individual BioMoby Services enables fine-grained reusing of each service in other workflows, thus reducing the effort required to achieve similar investigations in the future. AVAILABILITY: The workflow is freely available as a Taverna SCUFL XML document at the iCAPTURE Centre web site, http://www.mrl.ubc.ca/who/who_bios_scott_tebbutt.shtml.  相似文献   

14.
Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.  相似文献   

15.
Capsule: Trail cameras monitoring clutches of ground-nesting birds in Australia revealed survival rates and new causes of egg loss. We also show that nests with artificial eggs versus real eggs do not reveal the same information on predators.

Aims: We describe the application of trail cameras for monitoring real and artificial clutches of ground-nesting birds through a series of case studies. We rate the degree of inference used when defining nest outcomes and assigning fates.

Methods: Four case studies are presented, based on 326 deployments of cameras on real and artificial nests.

Results: The probability of hatching varied between species and populations (40.0–83.3% hatched), but not between urban and rural habitats. The ‘degree of inference’ scores did not differ between species and contexts. Two case studies which examined habitat-mediated survival (ecological hypotheses) found no difference in survival between urban and rural habitats, nor between open and covered microhabitats. Another case study (a management hypothesis) found that predator exclusion cages increased clutch survival even though predators sometimes breached the cages and cages altered the assemblage of predators visiting the area. A fourth study revealed that the assemblage of predators eating eggs differed between real and artificial nests.

Conclusion: Cameras enabled the survival and fate of most nests to be determined.  相似文献   


16.
17.
MOTIVATION: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. RESULTS: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. AVAILABILITY: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. SUPPLEMENTARY INFORMATION: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html  相似文献   

18.
Capsule: Carolina Wrens Thryothorus ludovicianus in urban and rural environments responded most intensely to predators common to their environments.

Aims: To determine the role of experience in predator recognition and response among Carolina Wrens. We predicted that wrens in the urban environment, where domestic cats are common, would respond more intensely to mounts of cats than snakes (a less common nest predator) placed near their nests. In the rural (forested) environment, we predicted a greater response to snakes than cats, because snakes are the more common predator in that environment.

Methods: We placed mounted specimens of a snake and a cat near wren nests at the late nestling stage and quantified responses. We used a Rock Dove Columba livia mount as the control because it is a non-threatening species to the nestlings, and should be familiar to the wrens.

Results: Carolina Wrens in the urban area responded most intensively to the cat mounts, whereas those in the rural environment responded more to the snake, with indications of innate predator recognition and defence. Cats were more common in the urban environment. Wrens used different alarm calls in the two habitats, but further study is needed to understand the significance of this variation.

Conclusions: Birds may have the ability to adapt their responses to local predators, both native and non-native, which may be especially important for their success in urbanized habitats.  相似文献   


19.
Chromatin interactions mediated by a protein of interest are of great scientific interest. Recent studies show that protein-mediated chromatin interactions can have different intensities in different types of cells or in different developmental stages of a cell. Such differences can be associated with a disease or with the development of a cell. Thus, it is of great importance to detect protein-mediated chromatin interactions with different intensities in different cells. A recent molecular technique, Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET), which uses formaldehyde cross-linking and paired-end sequencing, is able to detect genome-wide chromatin interactions mediated by a protein of interest. Here we proposed two models (One-Step Model and Two-Step Model) for two sample ChIA-PET count data (one biological replicate in each sample) to identify differential chromatin interactions mediated by a protein of interest. Both models incorporate the data dependency and the extent to which a fragment pair is related to a pair of DNA loci of interest to make accurate identifications. The One-Step Model makes use of the data more efficiently but is more computationally intensive. An extensive simulation study showed that the models can detect those differentially interacted chromatins and there is a good agreement between each classification result and the truth. Application of the method to a two-sample ChIA-PET data set illustrates its utility. The two models are implemented as an R package MDM (available at http://www.stat.osu.edu/~statgen/SOFTWARE/MDM).  相似文献   

20.
The combination of molecular sequence data and bioinformatics has revolutionized phylogenetic inference over the past decade, vastly increasing the scope of the evolutionary trees that we are able to infer. A recent paper in BMC Biology describing a new phylogenomic pipeline to help automate the inference of evolutionary trees from public sequence databases provides another important tool in our efforts to derive the Tree of Life. See research article: http://www.biomedcentral.com/1741-7007/9/55  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号