首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Detecting differentially expressed proteins is a key goal of proteomics. We describe a label-free method, the spectral index, for analyzing relative protein abundance in large-scale data sets derived from biological samples by shotgun proteomics. The spectral index is comprised of two biochemically plausible features: relative protein abundance (assessed by spectral counts) and the number of samples within a group with detectable peptides. We combined the spectral index with permutation analysis to establish confidence intervals for assessing differential protein expression in bronchoalveolar lavage fluid from cystic fibrosis and control subjects. Significant differences in protein abundance determined by the spectral index agreed well with independent biochemical measurements. When used to analyze simulated data sets, the spectral index outperformed four other statistical tests (Student's t-test, G-test, Bayesian t-test, and Significance Analysis of Microarrays) by correctly identifying the largest number of differentially expressed proteins. Correspondence analysis and functional annotation analysis indicated that the spectral index improves the identification of enriched proteins corresponding to clinical phenotypes. The spectral index is easily implemented and statistically robust, and its results are readily interpreted graphically. Therefore, it should be useful for biomarker discovery and comparisons of protein expression between normal and disease states.  相似文献   

2.
3.
There is a great interest in reliable ways to obtain absolute protein abundances at a proteome‐wide scale. To this end, label‐free LC‐MS/MS quantification methods have been proposed where all identified proteins are assigned an estimated abundance. Several variants of this quantification approach have been presented, based on either the number of spectral counts per protein or MS1 peak intensities. Equipped with several datasets representing real biological environments, containing a high number of accurately quantified reference proteins, we evaluate five popular low‐cost and easily implemented quantification methods (Absolute Protein Expression, Exponentially Modified Protein Abundance Index, Intensity‐Based Absolute Quantification Index, Top3, and MeanInt). Our results demonstrate considerably improved abundance estimates upon implementing accurately quantified reference proteins; that is, using spiked in stable isotope labeled standard peptides or a standard protein mix, to generate a properly calibrated quantification model. We show that only the Top3 method is directly proportional to protein abundance over the full quantification range and is the preferred method in the absence of reference protein measurements. Additionally, we demonstrate that spectral count based quantification methods are associated with higher errors than MS1 peak intensity based methods. Furthermore, we investigate the impact of miscleaved, modified, and shared peptides as well as protein size and the number of employed reference proteins on quantification accuracy.  相似文献   

4.
5.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

6.
MOTIVATION: In recent years, advances have been made in the ability of computational methods to discriminate between homologous and non-homologous proteins in the 'twilight zone' of sequence similarity, where the percent sequence identity is a poor indicator of homology. To make these predictions more valuable to the protein modeler, they must be accompanied by accurate alignments. Pairwise sequence alignments are inferences of orthologous relationships between sequence positions. Evolutionary distance is traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. RESULTS: HMMSUM (HMMSTR-based substitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM matrices or structure-based substitution matrices SDM and HSDM when validated against remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method.  相似文献   

7.
MacNab YC 《Biometrics》2003,59(2):305-315
We present Bayesian hierarchical spatial models for spatially correlated small-area health service outcome and utilization rates, with a particular emphasis on the estimation of both measured and unmeasured or unknown covariate effects. This Bayesian hierarchical model framework enables simultaneous modeling of fixed covariate effects and random residual effects. The random effects are modeled via Bayesian prior specifications reflecting spatial heterogeneity globally and relative homogeneity among neighboring areas. The model inference is implemented using Markov chain Monte Carlo methods. Specifically, a hybrid Markov chain Monte Carlo algorithm (Neal, 1995, Bayesian Learning for Neural Networks; Gustafson, MacNab, and Wen, 2003, Statistics and Computing, to appear) is used for posterior sampling of the random effects. To illustrate relevant problems, methods, and techniques, we present an analysis of regional variation in intraventricular hemorrhage incidence rates among neonatal intensive care unit patients across Canada.  相似文献   

8.
Reaction kinetics for complex, highly interconnected kinetic schemes are modeled using analytical solutions to a system of ordinary differential equations. The algorithm employs standard linear algebra methods that are implemented using MatLab functions in a Visual Basic interface. A graphical user interface for simple entry of reaction schemes facilitates comparison of a variety of reaction schemes. To ensure microscopic balance, graph theory algorithms are used to determine violations of thermodynamic cycle constraints. Analytical solutions based on linear differential equations result in fast comparisons of first order kinetic rates and amplitudes as a function of changing ligand concentrations. For analysis of higher order kinetics, we also implemented a solution using numerical integration. To determine rate constants from experimental data, fitting algorithms that adjust rate constants to fit the model to imported data were implemented using the Levenberg-Marquardt algorithm or using Broyden-Fletcher-Goldfarb-Shanno methods. We have included the ability to carry out global fitting of data sets obtained at varying ligand concentrations. These tools are combined in a single package, which we have dubbed VisKin, to guide and analyze kinetic experiments. The software is available online for use on PCs.  相似文献   

9.
Model averaging is gaining popularity among ecologists for making inference and predictions. Methods for combining models include Bayesian model averaging (BMA) and Akaike’s Information Criterion (AIC) model averaging. BMA can be implemented with different prior model weights, including the Kullback–Leibler prior associated with AIC model averaging, but it is unclear how the prior model weight affects model results in a predictive context. Here, we implemented BMA using the Bayesian Information Criterion (BIC) approximation to Bayes factors for building predictive models of bird abundance and occurrence in the Chihuahuan Desert of New Mexico. We examined how model predictive ability differed across four prior model weights, and how averaged coefficient estimates, standard errors and coefficients’ posterior probabilities varied for 16 bird species. We also compared the predictive ability of BMA models to a best single-model approach. Overall, Occam’s prior of parsimony provided the best predictive models. In general, the Kullback–Leibler prior, however, favored complex models of lower predictive ability. BMA performed better than a best single-model approach independently of the prior model weight for 6 out of 16 species. For 6 other species, the choice of the prior model weight affected whether BMA was better than the best single-model approach. Our results demonstrate that parsimonious priors may be favorable over priors that favor complexity for making predictions. The approach we present has direct applications in ecology for better predicting patterns of species’ abundance and occurrence.  相似文献   

10.
11.

Background

With the growing abundance of microarray data, statistical methods are increasingly needed to integrate results across studies. Two common approaches for meta-analysis of microarrays include either combining gene expression measures across studies or combining summaries such as p-values, probabilities or ranks. Here, we compare two Bayesian meta-analysis models that are analogous to these methods.

Results

Two Bayesian meta-analysis models for microarray data have recently been introduced. The first model combines standardized gene expression measures across studies into an overall mean, accounting for inter-study variability, while the second combines probabilities of differential expression without combining expression values. Both models produce the gene-specific posterior probability of differential expression, which is the basis for inference. Since the standardized expression integration model includes inter-study variability, it may improve accuracy of results versus the probability integration model. However, due to the small number of studies typical in microarray meta-analyses, the variability between studies is challenging to estimate. The probability integration model eliminates the need to model variability between studies, and thus its implementation is more straightforward. We found in simulations of two and five studies that combining probabilities outperformed combining standardized gene expression measures for three comparison values: the percent of true discovered genes in meta-analysis versus individual studies; the percent of true genes omitted in meta-analysis versus separate studies, and the number of true discovered genes for fixed levels of Bayesian false discovery. We identified similar results when pooling two independent studies of Bacillus subtilis. We assumed that each study was produced from the same microarray platform with only two conditions: a treatment and control, and that the data sets were pre-scaled.

Conclusion

The Bayesian meta-analysis model that combines probabilities across studies does not aggregate gene expression measures, thus an inter-study variability parameter is not included in the model. This results in a simpler modeling approach than aggregating expression measures, which accounts for variability across studies. The probability integration model identified more true discovered genes and fewer true omitted genes than combining expression measures, for our data sets.  相似文献   

12.
Bayesian quantitative trait loci mapping for multiple traits   总被引:1,自引:0,他引:1       下载免费PDF全文
Banerjee S  Yandell BS  Yi N 《Genetics》2008,179(4):2275-2289
Most quantitative trait loci (QTL) mapping experiments typically collect phenotypic data on multiple correlated complex traits. However, there is a lack of a comprehensive genomewide mapping strategy for correlated traits in the literature. We develop Bayesian multiple-QTL mapping methods for correlated continuous traits using two multivariate models: one that assumes the same genetic model for all traits, the traditional multivariate model, and the other known as the seemingly unrelated regression (SUR) model that allows different genetic models for different traits. We develop computationally efficient Markov chain Monte Carlo (MCMC) algorithms for performing joint analysis. We conduct extensive simulation studies to assess the performance of the proposed methods and to compare with the conventional single-trait model. Our methods have been implemented in the freely available package R/qtlbim (http://www.qtlbim.org), which greatly facilitates the general usage of the Bayesian methodology for unraveling the genetic architecture of complex traits.  相似文献   

13.
Our ability to effectively prevent the transmission of the dengue virus through targeted control of its vector, Aedes aegypti, depends critically on our understanding of the link between mosquito abundance and human disease risk. Mosquito and clinical surveillance data are widely collected, but linking them requires a modeling framework that accounts for the complex non-linear mechanisms involved in transmission. Most critical are the bottleneck in transmission imposed by mosquito lifespan relative to the virus’ extrinsic incubation period, and the dynamics of human immunity. We developed a differential equation model of dengue transmission and embedded it in a Bayesian hierarchical framework that allowed us to estimate latent time series of mosquito demographic rates from mosquito trap counts and dengue case reports from the city of Vitória, Brazil. We used the fitted model to explore how the timing of a pulse of adult mosquito control influences its effect on the human disease burden in the following year. We found that control was generally more effective when implemented in periods of relatively low mosquito mortality (when mosquito abundance was also generally low). In particular, control implemented in early September (week 34 of the year) produced the largest reduction in predicted human case reports over the following year. This highlights the potential long-term utility of broad, off-peak-season mosquito control in addition to existing, locally targeted within-season efforts. Further, uncertainty in the effectiveness of control interventions was driven largely by posterior variation in the average mosquito mortality rate (closely tied to total mosquito abundance) with lower mosquito mortality generating systems more vulnerable to control. Broadly, these correlations suggest that mosquito control is most effective in situations in which transmission is already limited by mosquito abundance.  相似文献   

14.
Ecological diffusion is a theory that can be used to understand and forecast spatio‐temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white‐tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression‐based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.  相似文献   

15.
In high-throughput mass spectrometry proteomics, peptides and proteins are not simply identified as present or not present in a sample, rather the identifications are associated with differing levels of confidence. The false discovery rate (FDR) has emerged as an accepted means for measuring the confidence associated with identifications. We have developed the Systematic Protein Investigative Research Environment (SPIRE) for the purpose of integrating the best available proteomics methods. Two successful approaches to estimating the FDR for MS protein identifications are the MAYU and our current SPIRE methods. We present here a method to combine these two approaches to estimating the FDR for MS protein identifications into an integrated protein model (IPM). We illustrate the high quality performance of this IPM approach through testing on two large publicly available proteomics datasets. MAYU and SPIRE show remarkable consistency in identifying proteins in these datasets. Still, IPM results in a more robust FDR estimation approach and additional identifications, particularly among low abundance proteins. IPM is now implemented as a part of the SPIRE system.  相似文献   

16.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

17.
As systems biology approaches to virology have become more tractable, highly studied viruses such as HIV can now be analyzed in new unbiased ways, including spatial proteomics. We employed here a differential centrifugation protocol to fractionate Jurkat T cells for proteomic analysis by mass spectrometry; these cells contain inducible HIV-1 genomes, enabling us to look for changes in the spatial proteome induced by viral gene expression. Using these proteomics data, we evaluated the merits of several reported machine learning pipelines for classification of the spatial proteome and identification of protein translocations. From these analyses, we found that classifier performance in this system was organelle dependent, with Bayesian t-augmented Gaussian mixture modeling outperforming support vector machine learning for mitochondrial and endoplasmic reticulum proteins but underperforming on cytosolic, nuclear, and plasma membrane proteins by QSep analysis. We also observed a generally higher performance for protein translocation identification using a Bayesian model, Bayesian analysis of differential localization experiments, on row-normalized data. Comparative Bayesian analysis of differential localization experiment analysis of cells induced to express the WT viral genome versus cells induced to express a genome unable to express the accessory protein Nef identified known Nef-dependent interactors such as T-cell receptor signaling components and coatomer complex. Finally, we found that support vector machine classification showed higher consistency and was less sensitive to HIV-dependent noise. These findings illustrate important considerations for studies of the spatial proteome following viral infection or viral gene expression and provide a reference for future studies of HIV-gene-dropout viruses.  相似文献   

18.
Models for Bounded Systems with Continuous Dynamics   总被引:4,自引:0,他引:4  
Summary .  Models for natural nonlinear processes, such as population dynamics, have been given much attention in applied mathematics. For example, species competition has been extensively modeled by differential equations. Often, the scientist has preferred to model the underlying dynamical processes (i.e., theoretical mechanisms) in continuous time. It is of both scientific and mathematical interest to implement such models in a statistical framework to quantify uncertainty associated with the models in the presence of observations. That is, given discrete observations arising from the underlying continuous process, the unobserved process can be formally described while accounting for multiple sources of uncertainty (e.g., measurement error, model choice, and inherent stochasticity of process parameters). In addition to continuity, natural processes are often bounded; specifically, they tend to have nonnegative support. Various techniques have been implemented to accommodate nonnegative processes, but such techniques are often limited or overly compromising. This article offers an alternative to common differential modeling practices by using a bias-corrected truncated normal distribution to model the observations and latent process, both having bounded support. Parameters of an underlying continuous process are characterized in a Bayesian hierarchical context, utilizing a fourth-order Runge–Kutta approximation.  相似文献   

19.
物种分布模型通常用于基础生态和应用生态研究,用来确定影响生物分布和物种丰富度的因素,量化物种与非生物条件的关系,预测物种对土地利用和气候变化的反应,并确定潜在的保护区.在传统的物种分布模型中,生物的相互作用很少被纳入,而联合物种分布模型(JSDMs)作为近年提出的一种新的可行方法,可以同时考虑环境因素和生物交互作用,因而成为分析生物群落结构和种间相互作用过程的有力工具.JSDMs以物种分布模型(SDMs)为基础,通常采用广义线性回归模型建立物种对环境变量的多变量响应,以随机效应的形式获取物种间的关联,同时结合隐变量模型(LVMs),并基于Laplace近似和马尔科夫蒙脱卡罗模拟的最大似然估计或贝叶斯方法来估算模型参数.本文对JSDMs的产生及理论基础进行归纳总结,重点介绍了不同类型JSDMs的特点及其在现代生态学中的应用,阐述了JSDMs的应用前景、使用过程中存在的问题及发展方向.随着对环境因素与多物种种间关系研究的深入,JSDMs将是今后物种分布模型研究的重点.  相似文献   

20.
Spectral counting has become a commonly used approach for measuring protein abundance in label-free shotgun proteomics. At the same time, the development of data analysis methods has lagged behind. Currently most studies utilizing spectral counts rely on simple data transforms and posthoc corrections of conventional signal-to-noise ratio statistics. However, these adjustments can neither handle the bias toward high abundance proteins nor deal with the drawbacks due to the limited number of replicates. We present a novel statistical framework (QSpec) for the significance analysis of differential expression with extensions to a variety of experimental design factors and adjustments for protein properties. Using synthetic and real experimental data sets, we show that the proposed method outperforms conventional statistical methods that search for differential expression for individual proteins. We illustrate the flexibility of the model by analyzing a data set with a complicated experimental design involving cellular localization and time course.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号