首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
ABSTRACT: BACKGROUND: A prerequisite for the mechanistic simulation of a biochemical system is detailed knowledge of its kinetic parameters. Despite recent experimental advances, the estimation of unknown parameter values from observed data is still a bottleneck for obtaining accurate simulation results. Many methods exist for parameter estimation in deterministic biochemical systems; methods for discrete stochastic systems are less well developed. Given the probabilistic nature of stochastic biochemical models, a natural approach is to choose parameter values that maximize the probability of the observed data with respect to the unknown parameters, a.k.a. the maximum likelihood parameter estimates (MLEs). MLE computation for all but the simplest models requires the simulation of many system trajectories that are consistent with experimental data. For models with unknown parameters, this presents a computational challenge, as the generation of consistent trajectories can be an extremely rare occurrence. RESULTS: We have developed Monte Carlo Expectation-Maximization with Modified Cross-Entropy Method (MCEM2): an accelerated method for calculating MLEs that combines advances in rare event simulation with a computationally efficient version of the Monte Carlo expectation-maximization (MCEM) algorithm. Our method requires no prior knowledge regarding parameter values, and it automatically provides a multivariate parameter uncertainty estimate. We applied the method to five stochastic systems of increasing complexity, progressing from an analytically tractable pure-birth model to a computationally demanding model of yeast-polarization. Our results demonstrate that MCEM2 substantially accelerates MLE computation on all tested models when compared to a stand-alone version of MCEM. Additionally, we show how our method identifies parameter values for certain classes of models more accurately than two recently proposed computationally efficient methods. CONCLUSIONS: This work provides a novel, accelerated version of a likelihood-based parameter estimation method that can be readily applied to stochastic biochemical systems. In addition, our results suggest opportunities for added efficiency improvements that will further enhance our ability to mechanistically simulate biological processes.  相似文献   

3.
Alterovitz G  Liu J  Afkhami E  Ramoni MF 《Proteomics》2007,7(16):2843-2855
Biological and medical data have been growing exponentially over the past several years [1, 2]. In particular, proteomics has seen automation dramatically change the rate at which data are generated [3]. Analysis that systemically incorporates prior information is becoming essential to making inferences about the myriad, complex data [4-6]. A Bayesian approach can help capture such information and incorporate it seamlessly through a rigorous, probabilistic framework. This paper starts with a review of the background mathematics behind the Bayesian methodology: from parameter estimation to Bayesian networks. The article then goes on to discuss how emerging Bayesian approaches have already been successfully applied to research across proteomics, a field for which Bayesian methods are particularly well suited [7-9]. After reviewing the literature on the subject of Bayesian methods in biological contexts, the article discusses some of the recent applications in proteomics and emerging directions in the field.  相似文献   

4.
Computational analysis of shotgun proteomics data   总被引:2,自引:0,他引:2  
Proteomics technology is progressing at an incredible rate. The latest generation of tandem mass spectrometers can now acquire tens of thousands of fragmentation spectra in a matter of hours. Furthermore, quantitative proteomics methods have been developed that incorporate a stable isotope-labeled internal standard for every peptide within a complex protein mixture for the measurement of relative protein abundances. These developments have opened the doors for 'shotgun' proteomics, yet have also placed a burden on the computational approaches that manage the data. With each new method that is developed, the quantity of data that can be derived from a single experiment increases. To deal with this increase, new computational approaches are being developed to manage the data and assess false positives. This review discusses current approaches for analyzing proteomics data by mass spectrometry and identifies present computational limitations and bottlenecks.  相似文献   

5.
Targeted proteomics has gained significant popularity in mass spectrometry‐based protein quantification as a method to detect proteins of interest with high sensitivity, quantitative accuracy and reproducibility. However, with the emergence of a wide variety of targeted proteomics methods, some of them with high‐throughput capabilities, it is easy to overlook the essence of each method and to determine what makes each of them a targeted proteomics method. In this viewpoint, we revisit the main targeted proteomics methods and classify them in four categories differentiating those methods that perform targeted data acquisition from targeted data analysis, and those methods that are based on peptide ion data (MS1 targeted methods) from those that rely on the peptide fragments (MS2 targeted methods).  相似文献   

6.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

7.
A crucial part of a successful systems biology experiment is an assay that provides reliable, quantitative measurements for each of the components in the system being studied. For proteomics to be a key part of such studies, it must deliver accurate quantification of all the components in the system for each tested perturbation without any gaps in the data. This will require a new approach to proteomics that is based on emerging targeted quantitative mass spectrometry techniques. The PeptideAtlas Project comprises a growing, publicly accessible database of peptides identified in many tandem mass spectrometry proteomics studies and software tools that allow the building of PeptideAtlas, as well as its use by the research community. Here, we describe the PeptideAtlas Project, its contents and components, and show how together they provide a unique platform to select and validate mass spectrometry targets, thereby allowing the next revolution in proteomics.  相似文献   

8.
MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de  相似文献   

9.
Abstract A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods.  相似文献   

10.
The advances in high-resolution mass spectrometry instrumentation, capable of accurate mass measurement and fast acquisition, have enabled new approaches for targeted quantitative proteomics. More specifically, analyses performed on quadrupole-orbitrap mass spectrometers operated in parallel reaction monitoring (PRM) mode leverage the intrinsic high resolving power and trapping capabilities. The PRM technique offers unmatched degrees of selectivity and analytical sensitivity, typically required to analyze peptides in complex samples, such as those encountered in biomedical research or clinical studies. The features of PRM have provoked a paradigm change in targeted experiments, by decoupling acquisition and data processing. It has resulted in a new analytical workflow comprising distinct methods for each step, thus enabling much larger flexibility. The PRM technique was further enhanced by a new data acquisition scheme, allowing dynamic parameter settings. The potential of the technique may radically impact future quantitative proteomics studies.  相似文献   

11.
SELDI-TOF-MS is rapidly gaining popularity as a screening tool for clinical applications of proteomics. Application of adequate statistical techniques in all the stages from measurement to information is obligatory. One of the statistical methods often used in proteomics is classification: the assignment of subjects to discrete categories, for example healthy or diseased. Lately, many new classification methods have been developed, often specifically for the analysis of X-omics data. For proteomics studies a good strategy for evaluating classification results is of prime importance, because usually the number of objects will be small and it would be wasteful to set aside part of these as a 'mere' test set. The present paper offers such a strategy in the form of a protocol which can be used for choosing among different statistical classification methods and obtaining figures of merit of their performance. This paper also illustrates the usefulness of proteomics in a clinical setting, serum samples from Gaucher disease patients, when used in combination with an appropriate classification method.  相似文献   

12.

Background  

Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints.  相似文献   

13.
14.
High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate?=?1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels.  相似文献   

15.
Kabbani N 《Proteomics》2008,8(19):4146-4155
Receptors represent an abundant class of integral membrane proteins that transmit information on various types of signals within the cell. Assemblages of receptors and their interacting proteins (receptor complexes) have emerged as important units of signal transduction for various types of receptors including G protein coupled, ligand-gated ion channel, and receptor tyrosine kinase. This review aims to summarize the major approaches and findings of receptor proteomics. Isolation and characterization of receptor complexes from cells has become common using the methods of immunoaffinity-, ligand-, and tag-based chromatography followed by MS for the analysis of enriched receptor preparations. In addition, tools such as stable isotope labeling have contributed to understanding quantitative properties and PTMs to receptors and their interacting proteins. As data from studies on receptor-protein interactions considerably expands, complementary approaches such as bioinformatics and computational biology will undoubtedly play a significant role in defining cellular and network functions for various types of receptor complexes. Findings from receptor proteomics may also shed light on the mechanism of action for pharmacological drugs and can be of value in understanding molecular pathologies of disease states.  相似文献   

16.
17.
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single‐tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large‐scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large‐scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.  相似文献   

18.
Wu H  Xue H  Kumar A 《Biometrics》2012,68(2):344-352
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches.  相似文献   

19.
High throughput proteome screening for biomarker detection   总被引:6,自引:0,他引:6  
Mass spectrometry-based quantitative proteomics has become an important component of biological and clinical research. Current methods, while highly developed and powerful, are falling short of their goal of routinely analyzing whole proteomes mainly because the wealth of proteomic information accumulated from prior studies is not used for the planning or interpretation of present experiments. The consequence of this situation is that in every proteomic experiment the proteome is rediscovered. In this report we describe an approach for quantitative proteomics that builds on the extensive prior knowledge of proteomes and a platform for the implementation of the method. The method is based on the selection and chemical synthesis of isotopically labeled reference peptides that uniquely identify a particular protein and the addition of a panel of such peptides to the sample mixture consisting of tryptic peptides from the proteome in question. The platform consists of a peptide separation module for the generation of ordered peptide arrays from the combined peptide sample on the sample plate of a MALDI mass spectrometer, a high throughput MALDI-TOF/TOF mass spectrometer, and a suite of software tools for the selective analysis of the targeted peptides and the interpretation of the results. Applying the method to the analysis of the human blood serum proteome we demonstrate the feasibility of using mass spectrometry-based proteomics as a high throughput screening technology for the detection and quantification of targeted proteins in a complex system.  相似文献   

20.
Approximately 20 years ago, Avise and colleagues proposed the integration of phylogenetics and population genetics for investigating the connection between micro- and macroevolutionary phenomena. The new field was termed phylogeography. Since the naming of the field, the statistical rigor of phylogeography has increased, in large part due to concurrent advances in coalescent theory which enabled model-based parameter estimation and hypothesis testing. The next phase will involve phylogeography increasingly becoming the integrative and comparative multi-taxon endeavor that it was originally conceived to be. This exciting convergence will likely involve combining spatially-explicit multiple taxon coalescent models, genomic studies of natural selection, ecological niche modeling, studies of ecological speciation, community assembly and functional trait evolution. This ambitious synthesis will allow us to determine the causal links between geography, climate change, ecological interactions and the evolution and composition of taxa across whole communities and assemblages. Although such integration presents analytical and computational challenges that will only be intensified by the growth of genomic data in non-model taxa, the rapid development of “likelihood-free” approximate Bayesian methods should permit parameter estimation and hypotheses testing using complex evolutionary demographic models and genomic phylogeographic data. We first review the conceptual beginnings of phylogeography and its accomplishments and then illustrate how it evolved into a statistically rigorous enterprise with the concurrent rise of coalescent theory. Subsequently, we discuss ways in which model-based phylogeography can interface with various subfields to become one of the most integrative fields in all of ecology and evolutionary biology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号