首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We investigate the performance of combinatorial pattern discovery to detect remote sequence similarities in terms of both biological accuracy and computational efficiency for a pair of distantly related families, as a case study. The two families represent the cupredoxins and multicopper oxidases, both containing blue copper-binding domains. These families present a challenging case due to low sequence similarity, different local structure, and variable sequence conservation at their copper-binding active sites. In this study, we investigate a new approach for automatically identifying weak sequence similarities that is based on combinatorial pattern discovery. We compare its performance with a traditional, HMM-based scheme and obtain estimates for sensitivity and specificity of the two approaches. Our analysis suggests that pattern discovery methods can be substantially more sensitive in detecting remote protein relationships while at the same time guaranteeing high specificity.  相似文献   

2.
Recent work has uncovered a growing number of bacterial small RNAs (sRNAs), some of which have been shown to regulate critical cellular processes. Computational approaches, in combination with experiments, have played an important role in the discovery of these sRNAs. In this article, we first give an overview of different computational approaches for genome-wide prediction of sRNAs. These approaches have led to the discovery of several novel sRNAs, however the regulatory roles are not yet known for a majority of these sRNAs. By contrast, several recent studies have highlighted the inverse problem where the functional role of the sRNA is already known and the challenge is to identify its genomic location. The focus of this article is on computational tools and strategies for identifying these specific sRNAs which function as key components of known regulatory pathways.  相似文献   

3.
Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology.  相似文献   

4.
MOTIVATION: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes. AVAILABILITY: The programs for computing the various kernel functions are available on request from the authors.  相似文献   

5.
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition.  相似文献   

6.
Recent advances in high-throughput experimental methods for the identification of protein interactions have resulted in a large amount of diverse data that are somewhat incomplete and contradictory. As valuable as they are, such experimental approaches studying protein interactomes have certain limitations that can be complemented by the computational methods for predicting protein interactions. In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions. We discuss the applicability of computational methods to different types of prediction problems and point out limitations common to all of them.  相似文献   

7.
Background: The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones.Results: In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem.Conclusions: In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.  相似文献   

8.
9.
Hydrodynamic stress is an influential physical parameter for various bioprocesses, affecting the performance and viability of the living organisms. However, different approaches are in use in various computational and experimental studies to calculate this parameter (including its normal and shear subcomponents) from velocity fields without a consensus on which one is the most representative of its effect on living cells. In this letter, we investigate these different methods with clear definitions and provide our suggested approach which relies on the principal stress values providing a maximal distinction between the shear and normal components. Furthermore, a numerical comparison is presented using the computational fluid dynamics simulation of a stirred and sparged bioreactor. It is demonstrated that for this specific bioreactor, some of these methods exhibit quite similar patterns throughout the bioreactor—therefore can be considered equivalent—whereas some of them differ significantly.  相似文献   

10.
Wu H  Xue H  Kumar A 《Biometrics》2012,68(2):344-352
Differential equations are extensively used for modeling dynamics of physical processes in many scientific fields such as engineering, physics, and biomedical sciences. Parameter estimation of differential equation models is a challenging problem because of high computational cost and high-dimensional parameter space. In this article, we propose a novel class of methods for estimating parameters in ordinary differential equation (ODE) models, which is motivated by HIV dynamics modeling. The new methods exploit the form of numerical discretization algorithms for an ODE solver to formulate estimating equations. First, a penalized-spline approach is employed to estimate the state variables and the estimated state variables are then plugged in a discretization formula of an ODE solver to obtain the ODE parameter estimates via a regression approach. We consider three different order of discretization methods, Euler's method, trapezoidal rule, and Runge-Kutta method. A higher-order numerical algorithm reduces numerical error in the approximation of the derivative, which produces a more accurate estimate, but its computational cost is higher. To balance the computational cost and estimation accuracy, we demonstrate, via simulation studies, that the trapezoidal discretization-based estimate is the best and is recommended for practical use. The asymptotic properties for the proposed numerical discretization-based estimators are established. Comparisons between the proposed methods and existing methods show a clear benefit of the proposed methods in regards to the trade-off between computational cost and estimation accuracy. We apply the proposed methods t an HIV study to further illustrate the usefulness of the proposed approaches.  相似文献   

11.
The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.  相似文献   

12.
Most of the fascinating phenomena studied in cell biology emerge from interactions among highly organized multimolecular structures embedded into complex and frequently dynamic cellular morphologies. For the exploration of such systems, computer simulation has proved to be an invaluable tool, and many researchers in this field have developed sophisticated computational models for application to specific cell biological questions. However, it is often difficult to reconcile conflicting computational results that use different approaches to describe the same phenomenon. To address this issue systematically, we have defined a series of computational test cases ranging from very simple to moderately complex, varying key features of dimensionality, reaction type, reaction speed, crowding, and cell size. We then quantified how explicit spatial and/or stochastic implementations alter outcomes, even when all methods use the same reaction network, rates, and concentrations. For simple cases, we generally find minor differences in solutions of the same problem. However, we observe increasing discordance as the effects of localization, dimensionality reduction, and irreversible enzymatic reactions are combined. We discuss the strengths and limitations of commonly used computational approaches for exploring cell biological questions and provide a framework for decision making by researchers developing new models. As computational power and speed continue to increase at a remarkable rate, the dream of a fully comprehensive computational model of a living cell may be drawing closer to reality, but our analysis demonstrates that it will be crucial to evaluate the accuracy of such models critically and systematically.  相似文献   

13.
The simulation of microstructures on a scale 1–1000?nm is a typical problem in colloid and polymer science, and this is also the realm of modern computational “soft nanotechnology”. Accordingly, computational methods rely heavily on time-honoured approaches for calculating the thermodynamical stability of complex mixtures. We describe such approaches in the framework of MesoDyn, a general purpose software package for field-based simulations methods, such as the polymer mean-field model for microphase formation and the Poisson–Boltzmann model for electrostatic interactions. The paper concludes with a small review of examples of application: the formation of microscopic structures in block copolymer bulk solutions, block copolymer melt structures on surfaces (thin films) and structure formation in tiny polymer surfactant droplets (polymersomes). The method works quite well in all cases where a mean-field model is appropriate, but it is a challenge to extend the simulations to systems in which specific correlations are important.  相似文献   

14.
MOTIVATION: A large amount of biomolecular network data for multiple species have been generated by high-throughput experimental techniques, including undirected and directed networks such as protein-protein interaction networks, gene regulatory networks and metabolic networks. There are many conserved functionally similar modules and pathways among multiple biomolecular networks in different species; therefore, it is important to analyze the similarity between the biomolecular networks. Network querying approaches aim at efficiently discovering the similar subnetworks among different species. However, many existing methods only partially solve this problem. RESULTS: In this article, a novel approach for network querying problem based on conditional random fields (CRFs) model is presented, which can handle both undirected and directed networks, acyclic and cyclic networks and any number of insertions/deletions. The CRF method is fast and can query pathways in a large network in seconds using a PC. To evaluate the CRF method, extensive computational experiments are conducted on the simulated and real data, and the results are compared with the existing network querying methods. All results show that the CRF method is very useful and efficient to find the conserved functionally similar modules and pathways in multiple biomolecular networks.  相似文献   

15.

Background  

Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance.  相似文献   

16.
The Chemical Master Equation (CME) is a cornerstone of stochastic analysis and simulation of models of biochemical reaction networks. Yet direct solutions of the CME have remained elusive. Although several approaches overcome the infinite dimensional nature of the CME through projections or other means, a common feature of proposed approaches is their susceptibility to the curse of dimensionality, i.e. the exponential growth in memory and computational requirements in the number of problem dimensions. We present a novel approach that has the potential to “lift” this curse of dimensionality. The approach is based on the use of the recently proposed Quantized Tensor Train (QTT) formatted numerical linear algebra for the low parametric, numerical representation of tensors. The QTT decomposition admits both, algorithms for basic tensor arithmetics with complexity scaling linearly in the dimension (number of species) and sub-linearly in the mode size (maximum copy number), and a numerical tensor rounding procedure which is stable and quasi-optimal. We show how the CME can be represented in QTT format, then use the exponentially-converging -discontinuous Galerkin discretization in time to reduce the CME evolution problem to a set of QTT-structured linear equations to be solved at each time step using an algorithm based on Density Matrix Renormalization Group (DMRG) methods from quantum chemistry. Our method automatically adapts the “basis” of the solution at every time step guaranteeing that it is large enough to capture the dynamics of interest but no larger than necessary, as this would increase the computational complexity. Our approach is demonstrated by applying it to three different examples from systems biology: independent birth-death process, an example of enzymatic futile cycle, and a stochastic switch model. The numerical results on these examples demonstrate that the proposed QTT method achieves dramatic speedups and several orders of magnitude storage savings over direct approaches.  相似文献   

17.
Mismatch string kernels for discriminative protein classification   总被引:1,自引:0,他引:1  
MOTIVATION: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies.  相似文献   

18.

Background  

In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer.  相似文献   

19.
In the past few years, a new generation of fold recognition methods has been developed, in which the classical sequence information is combined with information obtained from secondary structure and, sometimes, accessibility predictions. The results are promising, indicating that this approach may compete with potential-based methods (Rost B et al., 1997, J Mol Biol 270:471-480). Here we present a systematic study of the different factors contributing to the performance of these methods, in particular when applied to the problem of fold recognition of remote homologues. Our results indicate that secondary structure and accessibility prediction methods have reached an accuracy level where they are not the major factor limiting the accuracy of fold recognition. The pattern degeneracy problem is confirmed as the major source of error of these methods. On the basis of these results, we study three different options to overcome these limitations: normalization schemes, mapping of the coil state into the different zones of the Ramachandran plot, and post-threading graphical analysis.  相似文献   

20.
Macromolecules change their shape (conformation) in the process of carrying out their functions. The imaging by cryo-electron microscopy of rapidly-frozen, individual copies of macromolecules (single particles) is a powerful and general approach to understanding the motions and energy landscapes of macromolecules. Widely-used computational methods already allow the recovery of a few distinct conformations from heterogeneous single-particle samples, but the treatment of complex forms of heterogeneity such as the continuum of possible transitory states and flexible regions remains largely an open problem. In recent years there has been a surge of new approaches for treating the more general problem of continuous heterogeneity. This paper surveys the current state of the art in this area.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号