共查询到20条相似文献,搜索用时 15 毫秒
1.
Aligning proteins based on their structural similarity is a fundamental problem in molecular biology with applications in many settings, including structure classification, database search, function prediction, and assessment of folding prediction methods. Structural alignment can be done via several methods, including contact map overlap (CMO) maximization that aligns proteins in a way that maximizes the number of common residue contacts. In this paper, we develop a reduction-based exact algorithm for the CMO problem. Our approach solves CMO directly rather than after transformation to other combinatorial optimization problems. We exploit the mathematical structure of the problem in order to develop a number of efficient lower bounding, upper bounding, and reduction schemes. Computational experiments demonstrate that our algorithm runs significantly faster than existing exact algorithms and solves some hard CMO instances that were not solved in the past. In addition, the algorithm produces protein clusters that are in excellent agreement with the SCOP classification. An implementation of our algorithm is accessible as an on-line server at http://eudoxus.scs.uiuc.edu/cmos/cmos.html. 相似文献
2.
Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. 总被引:10,自引:0,他引:10 下载免费PDF全文
N Stojanovic L Florea C Riemer D Gumucio J Slightom M Goodman W Miller R Hardison 《Nucleic acids research》1999,27(19):3899-3910
Conserved segments in DNA or protein sequences are strong candidates for functional elements and thus appropriate methods for computing them need to be developed and compared. We describe five methods and computer programs for finding highly conserved blocks within previously computed multiple alignments, primarily for DNA sequences. Two of the methods are already in common use; these are based on good column agreement and high information content. Three additional methods find blocks with minimal evolutionary change, blocks that differ in at most k positions per row from a known center sequence and blocks that differ in at most k positions per row from a center sequence that is unknown a priori. The center sequence in the latter two methods is a way to model potential binding sites for known or unknown proteins in DNA sequences. The efficacy of each method was evaluated by analysis of three extensively analyzed regulatory regions in mammalian beta-globin gene clusters and the control region of bacterial arabinose operons. Although all five methods have quite different theoretical underpinnings, they produce rather similar results on these data sets when their parameters are adjusted to best approximate the experimental data. The optimal parameters for the method based on information content varied little for different regulatory regions of the beta-globin gene cluster and hence may be extrapolated to many other regulatory regions. The programs based on maximum allowed mismatches per row have simple parameters whose values can be chosen a priori and thus they may be more useful than the other methods when calibration against known functional sites is not available. 相似文献
3.
HOMSTRAD: a database of protein structure alignments for homologous families. 总被引:25,自引:3,他引:25 下载免费PDF全文
K. Mizuguchi C. M. Deane T. L. Blundell J. P. Overington 《Protein science : a publication of the Protein Society》1998,7(11):2469-2471
We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases. 相似文献
4.
Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow 下载免费PDF全文
M. Ravinet R. Faria R. K. Butlin J. Galindo N. Bierne M. Rafajlović M. A. F. Noor B. Mehlig A. M. Westram 《Journal of evolutionary biology》2017,30(8):1450-1477
Speciation, the evolution of reproductive isolation among populations, is continuous, complex, and involves multiple, interacting barriers. Until it is complete, the effects of this process vary along the genome and can lead to a heterogeneous genomic landscape with peaks and troughs of differentiation and divergence. When gene flow occurs during speciation, barriers restricting gene flow locally in the genome lead to patterns of heterogeneity. However, genomic heterogeneity can also be produced or modified by variation in factors such as background selection and selective sweeps, recombination and mutation rate variation, and heterogeneous gene density. Extracting the effects of gene flow, divergent selection and reproductive isolation from such modifying factors presents a major challenge to speciation genomics. We argue one of the principal aims of the field is to identify the barrier loci involved in limiting gene flow. We first summarize the expected signatures of selection at barrier loci, at the genomic regions linked to them and across the entire genome. We then discuss the modifying factors that complicate the interpretation of the observed genomic landscape. Finally, we end with a road map for future speciation research: a proposal for how to account for these modifying factors and to progress towards understanding the nature of barrier loci. Despite the difficulties of interpreting empirical data, we argue that the availability of promising technical and analytical methods will shed further light on the important roles that gene flow and divergent selection have in shaping the genomic landscape of speciation. 相似文献
5.
Dynalign: an algorithm for finding the secondary structure common to two RNA sequences 总被引:28,自引:0,他引:28
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis. 相似文献
6.
Songjian Lu Gunasheil Mandava Gaibo Yan Xinghua Lu 《Algorithms for molecular biology : AMB》2016,11(1):11
Background
The mutual exclusivity of somatic genome alterations (SGAs), such as somatic mutations and copy number alterations, is an important observation of tumors and is widely used to search for cancer signaling pathways or SGAs related to tumor development. However, one problem with current methods that use mutual exclusivity is that they are not signal-based; another problem is that they use heuristic algorithms to handle the NP-hard problems, which cannot guarantee to find the optimal solutions of their models.Method
In this study, we propose a novel signal-based method that utilizes the intrinsic relationship between SGAs on signaling pathways and expression changes of downstream genes regulated by pathways to identify cancer signaling pathways using the mutually exclusive property. We also present a relatively efficient exact algorithm that can guarantee to obtain the optimal solution of the new computational model.Results
We have applied our new model and exact algorithm to the breast cancer data. The results reveal that our new approach increases the capability of finding better solutions in the application of cancer research. Our new exact algorithm has a time complexity of \(O^{*}(1.325^{m})\)(Note: Following the recent convention, we use a star * to represent that the polynomial part of the time complexity is neglected), which has solved the NP-hard problem of our model efficiently.Conclusion
Our new method and algorithm can discover the true causes behind the phenotypes, such as what SGA events lead to abnormality of the cell cycle or make the cell metastasis lose control in tumors; thus, it identifies the target candidates for precision (or target) therapeutics.7.
Twenty-three sequences from the family of G-protein coupled receptors have been aligned according to the 'historical alignment' procedure of Feng and Doolittle. Fourier transform analysis of this reveals that parts of five of the seven putative membrane-spanning regions exhibit a periodicity of conserved/nonconserved residues which is compatible with the periodicity of the alpha-helix. This would place the conserved residues on one side of the helix, which may face the inside of the proposed seven membered helical bundle. 相似文献
8.
Dugan EL Doyle TL Humphries B Hasson CJ Newton RU 《Journal of strength and conditioning research / National Strength & Conditioning Association》2004,18(3):668-674
There has been an increasing volume of research focused on the load that elicits maximum power output during jump squats. Because of a lack of standardization for data collection and analysis protocols, results of much of this research are contradictory. The purpose of this paper is to examine why differing methods of data collection and analysis can lead to conflicting results for maximum power and associated optimal load. Six topics relevant to measurement and reporting of maximum power and optimal load are addressed: (a) data collection equipment, (b) inclusion or exclusion of body weight force in calculations of power, (c) free weight versus Smith machine jump squats, (d) reporting of average versus peak power, (e) reporting of load intensity, and (f) instructions given to athletes/ participants. Based on this information, a standardized protocol for data collection and reporting of jump squat power and optimal load is presented. 相似文献
9.
A computer algorithm, CLIX, capable of searching a crystallographic data-base of small molecules for candidates which have both steric and chemical likelihood of binding a protein of known three-dimensional structure is presented. The algorithm is a significant advance over previous strategies which consider solely steric or chemical requirements for binding. The algorithm is shown to be capable of predicting the correct binding geometry of sialic acid to a mutant influenza-virus hemagglutinin and of proposing a number of potential new ligands to this protein. 相似文献
10.
11.
Variations of the three-dimensional structure of the Escherichia coli ribosome in the range of overlap views. An application of the methods of multicone and local single-cone three-dimensional reconstruction. 总被引:1,自引:0,他引:1 下载免费PDF全文
Electron microscopic techniques are among the most important tools for obtaining structural information of biological specimens. However, the three-dimensional (3D) structural analysis of asymmetrical specimens that do not form crystalline sheets has traditionally presented serious methodological obstacles to its accomplishment. One of the fundamental questions to be addressed in this type of structural study is in what way, and to what degree, does the 3D structural conformation depend on the orientation of the specimen with respect to the electron microscopic support films. As a step in studying this problem, we have analyzed the variations of the 3D structure of the Escherichia coli 70S monosome by performing four different 3D reconstructions of the 70S monosome from subsets of images in the so-called overlap range of views. These subsets were selected according to a multivariate statistical analysis performed on the total population of overlap-range specimen images. A certain amount of structural variability exists among the 3D reconstructions, although many of the main morphological characteristics, as the relative orientation between the ribosomal subunits, remain unchanged. We have also generalized the random conical reconstruction technique (Radermacher, M., T. Wagenknecht, A. Verschoor, and J. Frank. 1987. J. Microsc. 146: 113-136) to include those cases where the specimen exhibits a rocking behavior with respect to the support. The resulting Multicone Reconstruction Technique has been applied to computer-generated images as well as the E. coli 70S monosome images from part of the overlap range of views. 相似文献
12.
13.
14.
K Matsumura Y Watanabe H Onoe Y Koyama Y Watanabe 《Physiological research / Academia Scientiarum Bohemoslovaca》1992,41(1):95-97
In rabbits and guinea pigs, hypothalamic sites for prostaglandin E2 (PGE2) action were studied by means of in vitro receptor autoradiography. The density of PGE2 binding sites (probably PGE2 receptors) was the highest in the anterior wall of the third ventricle (A3V). This result is consistent in all mammalian species ever studied, suggesting a fundamental role of the A3V in the hypothalamic action of PGE2, such as fever. 相似文献
15.
Reconstructing biological networks, such as metabolic and signaling networks, is at the heart of systems biology. Although many approaches exist for reconstructing network structure, few approaches recover the full dynamic behavior of a network. We survey such approaches that originate from computational scientific discovery, a subfield of machine learning. These take as input measured time course data, as well as existing domain knowledge, such as partial knowledge of the network structure. We demonstrate the use of these approaches on illustrative tasks of finding the complete dynamics of biological networks, which include examples of rediscovering known networks and their dynamics, as well as examples of proposing models for unknown networks. 相似文献
16.
The temporal behaviour of the nonlinear compartmental model we have developed for rat calcium metabolism is discussed with respect to the theoretical properties of the self-oscillating autocatalytic subunit around which the model is constructed. Depending on the approximations made, this subunit is described by a minimal two-variable model, SU2, or by a three-variable one, SU3. The diversity of the theoretical dynamic behaviours possible with SU2 is greatly increased with SU3. But the identification of SU3 parameter values in three different experimental situations reveals that biological constraints efficiently preserve a simple circadian rhythm for bone metabolism. This analysis indicates the significant contribution of the available bone crystal pool to the dynamic organization of this tissue, and hence to extracellular calcium homeostasis. 相似文献
17.
18.
Predicting the three-dimensional structure of proteins is still one of the most challenging problems in molecular biology. Despite its difficulty, several investigators have started to produce consistently low-resolution predictions for small proteins. However, in most of these cases, the prediction accuracy is still too low to make them useful. In the present article, we address the problem of obtaining better-quality predictions, starting from low-resolution models. To this end, we have devised a new procedure that uses these models, together with structure comparison methods, to identify the structural family of the target protein. This would allow, in a second step not described in the present work, to refine the predictions using conserved features of the identified family. In our approach, the structure database is investigated using predictions, at different accuracy levels, for a given protein. As query structures, we used both low-resolution versions of the native structures, as well as different sets of low accuracy predictions. In general, we found that for predictions with a resolution of > or =5-7 A, structure comparison methods were able to identify the fold of a protein in the top positions. 相似文献
19.
Rosenbusch JP 《Journal of structural biology》2001,136(2):144-157
High stability is a prominent characteristic of integral membrane proteins of known atomic structure. But rather than being an intrinsic property, it may be due to a selection exerted by biochemical procedures prior to structure determination, since solubilization results in the transient exposure of membrane proteins to solution conditions. This may cause structural perturbations that interfere with 3D crystallization and hence with X-ray analysis. This problem also affects the preparation of samples for electron crystallography and NMR studies and may account for the fact that high-resolution structures of representatives of whole groups, such as transport proteins and signal transducers, have not been elucidated so far by any method. A knowledge of the proportion of labile proteins among membrane proteins, and of the kinetics of their denaturation, is therefore necessary. Establishing stability profiles, developing methods to maintain lateral pressure, or preventing contact with water (or both) should prove significant in establishing the structures of conformationally flexible proteins. 相似文献
20.
Predicting secondary structures from a protein sequence is an important step for characterizing the structural properties of a protein. Existing methods for protein secondary structure prediction can be broadly classified into template based or sequence profile based methods. We propose a novel framework that bridges the gap between the two fundamentally different approaches. Our framework integrates the information from the fuzzy k-nearest neighbor algorithm and position-specific scoring matrices using a neural network. It combines the strengths of the two methods and has a better potential to use the information in both the sequence and structure databases than existing methods. We implemented the framework into a software system MUPRED. MUPRED has achieved three-state prediction accuracy (Q3) ranging from 79.2 to 80.14%, depending on which benchmark dataset is used. A higher Q3 can be achieved if a query protein has a significant sequence identity (>25%) to a template in PDB. MUPRED also estimates the prediction accuracy at the individual residue level more quantitatively than existing methods. The MUPRED web server and executables are freely available at http://digbio.missouri.edu/mupred. 相似文献