共查询到20条相似文献,搜索用时 0 毫秒
1.
Michal Brylinski 《PLoS computational biology》2014,10(9)
Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4–9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite.
This is a PLOS Computational Biology Software Article相似文献
2.
A technique for prediction of protein membrane toplogy (intra- and extraceullular sidedness) has been developed. Membrane-spanning segments are first predicted using an algorithm based upon multiply aligned amino acid sequences. The compositional differences in the protein segments exposed at each side of the membrane are then investigated. The ratios are calculated for Asn, Asp, Gly, Phe, Pro, Trp, Tyr, and Val, mostly found on the extracellular side, and for Ala, Arg, Cys, and Lys, mostly occurring on the intracellular side. The consensus over these 12 residue distributions is used for sidedness prediction. The method was developed with a set of 42 protein families for which all but one were correctly predicted with the new algorithm. This represents an improvement over previous techniques. The new method, applied to a set of 12 membrane protein families different from the test set and with recently determined topologies, performed well, with 11 of 12 sidedness assignments agreeing with experimental results. The method has also been applied to several membrane protein families for which the topology has yet to be determined. An electronic prediction service is available at the E-mail address tmap@embl-heidelberg.de and on WWW via http://www.emblheidelberg.de. 相似文献
3.
Background
Mass spectrometry based peptide mass fingerprints (PMFs) offer a fast, efficient, and robust method for protein identification. A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database. However, existing tools for analyzing PMFs often suffer from missing or heuristic analysis of the significance of search results and insufficient handling of missing and additional peaks. 相似文献4.
Molecular Biology - Multiple alignment of amino acid sequences of homologous proteins is a key tool in state-of-the-art bioinformatics and evolutionary analysis. Differences in the spatial... 相似文献
5.
Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments. 相似文献
6.
The Inductive Structure Protein Analysis (IPSA) project presents a new method for investigating protein structure. IPSA includes the creation of a new database which was designed specifically for the analysis of protein structure by statistics and machine learning. The Protein Representation Language (PRL) database includes explicit and symbolic representations of geometrical, topological and chemophysical information about secondary structures and the relationships between secondary structures. The IPSA methodology consists of: the use of PRL information to produce a new database of examples of secondary structures which associate together (examples of possible super-secondary structures); then the use of a variety of clustering techniques to produce a consensus clustering of these examples (super-secondary structures); these super-secondary structures are finally examined to uncover any biological features of significance. We have applied this method to find simple super-secondary structures consisting of pairs of alpha-helices. We found four well-defined super-secondary structures, one formed exclusively by long range interactions, and another in association with an additional element of secondary structure (alpha t alpha-motif). Examinations were carried out using homologous pairs and conformational fits which confirm our clustering. 相似文献
7.
《Cell communication & adhesion》2013,20(5):385-402
The integrins are α/β heterodimeric proteins which mediate cell-matrix and cell-cell inter-actions. Current data indicate that the N-terminal moiety of the a subunit is involved in ligand binding. This region of the receptor is made up of a seven-fold repeated sequence of unknown structure which contains EF-hand-like putative divalent cation-binding sites. Recent studies have shown that multiple sequence alignments can be analysed to yield secondary structure predictions. Therefore, to obtain a model structure for the integrin a subunit N-terminal domain repeat, a large alignment of the seven repeats from sixteen integrin sequences was generated. Two methods of analysis were used: First, Chou and Fasman and Garnier, Osguthorpe and Robson predictions were carried out for individual sequences and the consensus predictions derived. Consensus hydrophobicity and chain flexibility data were also used to provide additional data. Second, sites of conservation and variation were analysed by a computer program STAMA (STructure After Multiple Alignment) to yield a secondary structure prediction. The two analyses gave essentially the same predicted structure: undefined region, loop, α-helix, β-strand, divalent cation-binding loop, β-strand, putative turn, loop, β-strand. This is the first model structure to be presented for an integrin domain. Its implications for integrin function are discussed. 相似文献
8.
PASS2: a semi-automated database of Protein Alignments Organised as Structural Superfamilies
下载免费PDF全文

PASS2 is a nearly automated version of CAMPASS and contains sequence alignments of proteins grouped at the level of superfamilies. This database has been created to fall in correspondence with SCOP database (1.53 release) and currently consists of 110 multi-member superfamilies and 613 superfamilies corresponding to single members. In multi-member superfamilies, protein chains with no more than 25% sequence identity have been considered for the alignment and hence the database aims to address sequence alignments which represent 26 219 protein domains under the SCOP 1.53 release. Structure-based sequence alignments have been obtained by COMPARER and the initial equivalences are provided automatically from a MALIGN alignment and subsequently augmented using STAMP4.0. The final sequence alignments have been annotated for the structural features using JOY4.0. Several interesting links are provided to other related databases and genome sequence relatives. Availability of reliable sequence alignments of distantly related proteins, despite poor sequence identity and single-member superfamilies, permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure–function relationships of individual superfamilies. The database can be queried by keywords and also by sequence search, interfaced by PSI-BLAST methods. Structure-annotated sequence alignments and several structural accessory files can be retrieved for all the superfamilies including the user-input sequence. The database can be accessed from http://www.ncbs.res.in/%7Efaculty/mini/campass/pass.html. 相似文献
9.
The review considers the original works on the primary structure of biopolymers carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983–1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of megabase sequences takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith–Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith–Waterman one. The STRSWer algorithm takes into account the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10–30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith–Waterman algorithm. 相似文献
10.
Naomi N. Barak Piotr Neumann Madhumati Sevvana Mike Schutkowski Miroslav Maleševi? Gunter Fischer David M. Ferrari 《Journal of molecular biology》2009,385(5):1630-10421
The protein disulfide isomerase-related protein ERp29 is a putative chaperone involved in processing and secretion of secretory proteins. Until now, however, both the structure and the exact nature of interacting substrates remained unclear. We provide for the first time a crystal structure of human ERp29, refined to 2.9 Å, and show that the protein has considerable structural homology to its Drosophila homolog Wind. We show that ERp29 binds directly not only to thyroglobulin and thyroglobulin-derived peptides in vitro but also to the Wind client protein Pipe and Pipe-derived peptides, although it fails to process Pipe in vivo. A monomeric mutant of ERp29 and a D domain mutant in which the second peptide binding site is inactivated also bind protein substrates, indicating that the monomeric thioredoxin domain is sufficient for client protein binding. Indeed, the b domains of ERp29 or Wind, expressed alone, are sufficient for binding proteins and peptides. Interacting peptides have in common two or more aromatic residues, with stronger binding for sequences with overall basic character. Thus, the data allow a view of the two putative peptide binding sites of ERp29 and indicate that the apparent, different processing activity of the human and Drosophila proteins in vivo does not stem from differences in peptide binding properties. 相似文献
11.
12.
The issue of amino acid depth in proteins gives important insights to our understanding of protein’s three-dimensional structure.
There has already been much research done in mathematical and statistical sciences regarding the general definitions, properties
and algorithms describing the particle depth of spatially extended systems. We constructed a method of calculating the amino
acids depths and applied it to a set of 527 protein structures. We propose the introduction of amino acid depth tendency factors
for three-dimensional structures of proteins. The depth tendency factors relate not only to the hydrophobicity indices but
also to the electrostatic charge. We found a relationship between the protein size and the number of residues using the distance
between the deepest residue and surface residues. We made a prediction regarding the number of residues on the surface of
a protein, the deepest amino acid, and the average depth, all of which are fitted well to a linear functional relationship
with the length of the protein. Finally, we have predicted the depths of multiple peptides in protein’s three-dimension structure.
Electronic supplementary material The online version of this article () contains supplementary material, which is available to authorized users. 相似文献
13.
Robyn L. Overall Teresa P. Dibbayawan Leila M. Blackman 《Journal of Plant Growth Regulation》2001,20(2):162-169
apd: 3 July 2001 相似文献
14.
Multiple sequence alignments have wide applicability in many areas of computational biology, including comparative genomics, functional annotation of proteins, gene finding, and modeling evolutionary processes. Because of the computational difficulty of multiple sequence alignment and the availability of numerous tools, it is critical to be able to assess the reliability of multiple alignments. We present a tool called StatSigMA to assess whether multiple alignments of nucleotide or amino acid sequences are contaminated with one or more unrelated sequences. There are numerous applications for which StatSigMA can be used. Two such applications are to distinguish homologous sequences from nonhomologous ones and to compare alignments produced by various multiple alignment tools. We present examples of both types of applications. 相似文献
15.
16.
Laskowski RA 《Molecular biotechnology》2011,48(2):183-198
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the
most general interest are the various atlases that describe each experimentally determined protein structure and provide useful
links, analyses and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the
databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect
from sequence comparison alone. Related to these are the numerous servers that compare folds—particularly useful for newly
solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the most
specialized user, dealing with specific families, diseases, structural features and so on. 相似文献
17.
Marianne Rooman Yves Dehouck Jean Marc Kwasigroch Christophe Biot Dimitri Gilis 《Journal of biomolecular structure & dynamics》2013,31(3):327-329
Abstract We would be tempted to state that there has never been a Levinthal paradox. Indeed, Levinthal raised an interesting problem about protein folding, as he realized that proteins have no time to explore exhaustively their conformational space on the way to their native structure. He did not seem to find this paradoxical and immediately proposed a straightforward solution, which has essentially never been refuted. In other words, Levinthal solved his own paradox. 相似文献
18.
Valeria Itskovich Andrey Gontcharov Yoshiki Masuda Tsutomu Nohno Sergey Belikov Sofia Efremova Martin Meixner Dorte Janussen 《Journal of molecular evolution》2008,67(6):608-620
Freshwater sponges include six extant families which belong to the suborder Spongillina (Porifera). The taxonomy of freshwater
sponges is problematic and their phylogeny and evolution are not well understood. Sequences of the ribosomal internal transcribed
spacers (ITS1 and ITS2) of 11 species from the family Lubomirskiidae, 13 species from the family Spongillidae, and 1 species
from the family Potamolepidae were obtained to study the phylogenetic relationships between endemic and cosmopolitan freshwater
sponges and the evolution of sponges in Lake Baikal. The present study is the first one where ITS1 sequences were successfully
aligned using verified secondary structure models and, in combination with ITS2, used to infer relationships between the freshwater
sponges. Phylogenetic trees inferred using maximum likelihood, neighbor-joining, and parsimony methods and Bayesian inference
revealed that the endemic family Lubomirskiidae was monophyletic. Our results do not support the monophyly of Spongillidae
because Lubomirskiidae formed a robust clade with E. muelleri, and Trochospongilla latouchiana formed a robust clade with the outgroup Echinospongilla brichardi (Potamolepidae). Within the cosmopolitan family Spongillidae the genera Radiospongilla and Eunapius were found to be monophyletic, while Ephydatia
muelleri was basal to the family Lubomirskiidae. The genetic distances between Lubomirskiidae species being much lower than those
between Spongillidae species are indicative of their relatively recent radiation from a common ancestor. These results indicated
that rDNA spacers sequences can be useful in the study of phylogenetic relationships of and the identification of species
of freshwater sponges. 相似文献
19.
The PsbO protein is a ubiquitous extrinsic subunit of Photosystem II (PS II), the water splitting enzyme of photosynthesis. A recently determined 3D X-ray structure of a cyanobacterial protein bound to PS II has given an opportunity to conduct complete analyses of its sequence and structural characteristics using bioinformatic methods. Multiple sequence alignments for the PsbO family are constructed and correlated with the cyanobacterial structure. We identify the most conserved regions of PsbO and the mapping of their positions within the structure indicates their functional roles especially in relation to interactions of this protein with the lumenal surface of PS II. Homologous models for eukaryotic PsbO were built in order to compare with the prokaryotic protein. We also explore structural homology between PsbO and other proteins for which 3D structures are known and determine its structural classification. These analyses contribute to the understanding of the function and evolutionary origin of the PS II manganese stabilising protein. 相似文献
20.
STRAP: editor for STRuctural Alignments of Proteins 总被引:1,自引:0,他引:1
STRAP is a comfortable and extensible tool for the generation and refinement of multiple alignments of protein sequences. Various sequence ordered input file formats are supported. These are the SwissProt-,GenBank-, EMBL-, DSSP- PDB-, MSF-, and plain ASCII text format. The special feature of STRAP is the simple visualization of spatial distances C(alpha)-atoms within the alignment. Thus structural information can easily be incorporated into the sequence alignment and can guide the alignment process in cases of low sequence similarities. Further STRAP is able to manage huge alignments comprising a lot of sequences. The protein viewers and modeling programs INSIGHT, RASMOL and WEBMOL are embedded into STRAP. STRAP is written in JAVA: The well-documented source code can be adapted easily to special requirements. STRAP may become the basis for complex alignment tools in the future. 相似文献