首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Should reports on molecular mimicry in particular cases, e.g. responsible for cross-reactivity, be considered as accidental or as a general principle in protein evolution? To answer this question, two types of similarity have to be considered: those in homologues (synonyms) and resemblance between patches from unrelated proteins (homonyms). RESULTS: All interfaces from known protein structures were collected in a comprehensive data bank [Dictionary of Interfaces in Proteins (DIP)]. A fast, sequence-independent, three-dimensional superposition procedure was developed to search automatically for geometrically similar surface areas. Surprisingly, we found a large number of structurally similar interfaces on the surface of unrelated proteins. Even patches from different types of secondary structure were found resembling each other. The putative functional meaning of homonyms is demonstrated with striking examples.  相似文献   

2.
A new method has been developed to detect functional relationships among proteins independent of a given sequence or fold homology. It is based on the idea that protein function is intimately related to the recognition and subsequent response to the binding of a substrate or an endogenous ligand in a well-characterized binding pocket. Thus, recognition of similar ligands, supposedly linked to similar function, requires conserved recognition features exposed in terms of common physicochemical interaction properties via the functional groups of the residues flanking a particular binding cavity. Following a technique commonly used in the comparison of small molecule ligands, generic pseudocenters coding for possible interaction properties were assigned for a large sample set of cavities extracted from the entire PDB and stored in the database Cavbase. Using a particular query cavity a series of related cavities of decreasing similarity is detected based on a clique detection algorithm. The detected similarity is ranked according to property-based surface patches shared in common by the different clique solutions. The approach either retrieves protein cavities accommodating the same (e.g. co-factors) or closely related ligands or it extracts proteins exhibiting similar function in terms of a related catalytic mechanism. Finally the new method has strong potential to suggest alternative molecular skeletons in de novo design. The retrieval of molecular building blocks accommodated in a particular sub-pocket that shares similarity with the pocket in a protein studied by drug design can inspire the discovery of novel ligands.  相似文献   

3.
4.
We experimentally disturbed stones in three streams of different sizes and followed the macroinvertebrate colonization process in terms of abundance, species richness and similarity over 64 days. We hypothesized that colonization in the smallest and in the largest streams would be slower than in the medium‐sized stream. The small upstream pool of colonists available in the smallest stream could restrict colonization, while in the largest stream predation by the diverse fish assemblage could restrict drifting colonists. The medium‐sized stream did not have these two constraints. We found similar colonization patterns in all three streams, leading to the rejection of the stated hypotheses. Lack of support of the original hypothesis might be due to the weakness of the two hypothesized restrictions on colonization. In addition, colonization by crawling species from undisturbed nearby patches might be of significant importance. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

5.
Landscape similarity search involves finding landscapes from among a large collection that are similar to a query landscape. An example of such collection is a large land cover map subdivided into a grid of smaller local landscapes, a query is a local landscape of interest, and the task is to find other local landscapes within a map which are perceptually similar to the query. Landscape search and the related task of pattern-based regionalization, requires a measure of similarity – a function which quantifies the level of likeness between two landscapes. The standard approach is to use the Euclidean distance between vectors of landscape metrics derived from the two landscapes, but no in-depth analysis of this approach has been conducted. In this paper we investigate the performance of different implementations of the standard similarity measure. Five different implementations are tested against each other and against a control similarity measure based on histograms of class co-occurrence features and the Jensen–Shannon divergence. Testing consists of a series of numerical experiments combined with visual assessments on a set of 400 3 km-scale landscapes. Based on the cases where visual assessment provides definitive answer, we have determined that the standard similarity measure is sensitive to the way landscape metrics are normalized and, additionally, to whether weights aimed at controlling the relative contribution of landscape composition vs. configuration are used. The standard measure achieves the best performance when metrics are normalized using their extreme values extracted from all possible landscapes, not just the landscapes in the given collection, and when weights are assigned so the combined influence of composition metrics on the similarity value equals the combined influence of configuration metrics. We have also determined that the control similarity measure outperforms all implementations of the standard measure.  相似文献   

6.
MOTIVATION: Ideally, only proteins that exhibit highly similar domain architectures should be compared with one another as homologues or be classified into a single family. By combining three different indices, the Jaccard index, the Goodman-Kruskal gamma function and the domain duplicate index, into a single similarity measure, we propose a method for comparing proteins based on their domain architectures. RESULTS: Evaluation of the method using the eukaryotic orthologous groups of proteins (KOGs) database indicated that it allows the automatic and efficient comparison of multiple-domain proteins, which are usually refractory to classic approaches based on sequence similarity measures. As a case study, the PDZ and LRR_1 domains are used to demonstrate how proteins containing promiscuous domains can be clearly compared using our method. For the convenience of users, a web server was set up where three different query interfaces were implemented to compare different domain architectures or proteins with domain(s), and to identify the relationships among domain architectures within a given KOG from the Clusters of Orthologous Groups of Proteins database. Conclusion: The approach we propose is suitable for estimating the similarity of domain architectures of proteins, especially those of multidomain proteins. AVAILABILITY: http://cmb.bnu.edu.cn/pdart/.  相似文献   

7.
We consider the problem of similarity queries in biological network databases. Given a database of networks, similarity query returns all the database networks whose similarity (i.e. alignment score) to a given query network is at least a specified similarity cutoff value. Alignment of two networks is a very costly operation, which makes exhaustive comparison of all the database networks with a query impractical. To tackle this problem, we develop a novel indexing method, named RINQ (Reference-based Indexing for Biological Network Queries). Our method uses a set of reference networks to eliminate a large portion of the database quickly for each query. A reference network is a small biological network. We precompute and store the alignments of all the references with all the database networks. When our database is queried, we align the query network with all the reference networks. Using these alignments, we calculate a lower bound and an approximate upper bound to the alignment score of each database network with the query network. With the help of upper and lower bounds, we eliminate the majority of the database networks without aligning them to the query network. We also quickly identify a small portion of these as guaranteed to be similar to the query. We perform pairwise alignment only for the remaining networks. We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks. Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.  相似文献   

8.
It is observed that during divergent evolution of two proteins with a common phylogenetic origin, the structural similarity of their backbones is often preserved even when the sequence similarity between them decreases to a virtually undetectable level. Here we analyzed, whether the conservation of structure along evolution involves also the local atomic structures in the interfaces between secondary structural elements. We have used as study case one protein family, the proteasomal subunits, for which 17 crystal structures are known. These include 14 different subunits of Saccharomyces cerevisiae, 2 subunits of Thermoplasma acidophilum and one subunit of Escherichia coli. The structural core of the 17 proteasomal subunits has 23 secondary structural elements. Any two adjacent secondary structural elements form a molecular interface consisting of two molecular patches. We found 61 interfaces that occurred in all 17 subunits. The 3D shape of equivalent molecular patches from different proteasomal subunits were compared by superposition. Our results demonstrate that pairs of equivalent molecular patches show an RMSD which is lower than that of randomly chosen patches from unrelated proteins. This is true even when patch comparisons with identical residues were excluded from the analysis. Furthermore it is known that the sequential dissimilarity is correlated to the RMSD between the backbones of the members of protein families. The question arises whether this is also true for local atomic structures. The results show that the correlation of individual patch RMSD values and local sequence dissimilarities is low and has a wide range from 0 to 0.41, however, it is surprising that there is a good correlation between the average RMSD of all corresponding patches and the global sequence dissimilarity. This average patch RMSD correlates slightly stronger than the C(alpha)-trace RMSD to the global sequence dissimilarity.  相似文献   

9.
Abstract The diversity and abundance of arboreal and flying arthropods, in three mangrove patches along the south coast of New South Wales, Australia, was investigated to determine the degree of spatial variability in the assemblages among patches. Intercept traps and restricted canopy fogging were used to sample the communities at Minnamurra, Bonnievale and Kurnell. Twelve orders of arthropods were detected, incorporating 252 morphospecies. Abundance, species richness and species composition were very similar across all patches, the variation being much smaller than expected. These findings suggest that the composition of the arboreal and flying fauna associated with mangrove patches are very similar among patches, but preliminary results also showed that species composition could be highly variable within a patch. Variation between the trapping methods was large, as expected . Intercept trapping and restricted canopy fogging techniques were found to sample different suites of species and therefore complement each other well in sampling programs. Cumulative species curves differed between time periods but generally were flatter for intercept traps than for restricted canopy fogging. Results suggested, for a given level of effort, intercept traps caught a more representative sample of the species composition available to them.  相似文献   

10.
MOTIVATION: Given that association and dissociation of protein molecules is crucial in most biological processes several in silico methods have been recently developed to predict protein-protein interactions. Structural evidence has shown that usually interacting pairs of close homologs (interologs) physically interact in the same way. Moreover, conservation of an interaction depends on the conservation of the interface between interacting partners. In this article we make use of both, structural similarities among domains of known interacting proteins found in the Database of Interacting Proteins (DIP) and conservation of pairs of sequence patches involved in protein-protein interfaces to predict putative protein interaction pairs. RESULTS: We have obtained a large amount of putative protein-protein interaction (approximately 130,000). The list is independent from other techniques both experimental and theoretical. We separated the list of predictions into three sets according to their relationship with known interacting proteins found in DIP. For each set, only a small fraction of the predicted protein pairs could be independently validated by cross checking with the Human Protein Reference Database (HPRD). The fraction of validated protein pairs was always larger than that expected by using random protein pairs. Furthermore, a correlation map of interacting protein pairs was calculated with respect to molecular function, as defined in the Gene Ontology database. It shows good consistency of the predicted interactions with data in the HPRD database. The intersection between the lists of interactions of other methods and ours produces a network of potentially high-confidence interactions.  相似文献   

11.
Quantifying similarity between motifs   总被引:2,自引:0,他引:2  
A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.  相似文献   

12.

Background

Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate.

Results

We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/.

Conclusions

Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.  相似文献   

13.
Determining whether the composition of ecological communities (species presence and abundance), can be predicted from species demographic traits, rather than being a result of neutral drift, is a key ecological question. Here we compare the similarity of community composition, from different community assembly models run under identical environmental conditions, where interspecific competition is assumed to be either neutral or niche-based. In both cases, species colonize a focal patch from a network of neighbouring patches in a metacommunity. We highlight the circumstances (rate and spatial scale of dispersal, and the relative importance of ecological drift) where commonly used community similarity metrics or species rank–abundance relationships are likely to give similar results, regardless of the underlying processes (neutral or non-neural) driving species' dynamics. As drift becomes more important in driving species abundances, deterministic niche structure has a smaller influence. Our ability to discriminate between different underlying processes driving community organization depends on the relative importance of different drift processes that operate on different spatial scales.  相似文献   

14.
We propose new methods for finding similarities in protein structure databases. These methods extract feature vectors on triplets of SSEs (Secondary Structure Elements) of proteins. The feature vectors are then indexed using a multidimensional index structure. Our first technique considers the problem of finding proteins similar to a given query protein in a protein dataset. It quickly finds promising proteins using the index structure. These proteins are then aligned to the query protein using a popular pairwise alignment tool such as VAST. We also develop a novel statistical model to estimate the goodness of a match using the SSEs. Our second technique considers the problem of joining two protein datasets to find an all-to-all similarity. Experimental results show that our techniques improve the pruning time of VAST 3 to 3.5 times, while keeping the sensitivity similar. Our technique can also be incorporated with DALI and CE to improve their running times by a factor of 2 and 2.7 respectively. The software is available online at http://bioserver.cs.ucsb.edu/.  相似文献   

15.
Within a rewarding floral patch, eusocial bee foragers frequently switch sites, going from one flower to another. However, site switching between patches tends to occur with low frequency while a given patch is still rewarding, thus reducing pollen dispersal and gene flow between patches. In principle, forager switching and gene flow between patches could be higher when close patches offer similar rewards. We investigated site switching during food recruitment in the stingless bee Scaptotrigona mexicana . Thus, we trained three groups of foragers to three feeders in different locations, one group per location. These groups did not interact each other during the training phase. Next, interaction among trained foragers was allowed. We found that roughly half of the foragers switched sites, the other half remaining faithful to its training feeder. Switching is influenced by the presence of recruitment information. In the absence of recruitment information (bees visiting and recruiting for feeders), employed foragers were site specific. Foragers only switched among feeders that were being visited and recruited to. Switching was not caused by learned aversion to experimental handling. Switching in response to recruitment could provide a fitness benefit to the colony by facilitating rapid switching among exploited patches and provide a benefit of increasing plant gene flow between patches.  相似文献   

16.
Ohlson T  Wallner B  Elofsson A 《Proteins》2004,57(1):188-197
To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.  相似文献   

17.
Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST  相似文献   

18.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

19.
Twenty vancomycin resistant E. faecium strains (VRE) isolated from patients of three different hospital wards in 2005-2008 were examined. The strains originated from patients of intensive therapy, urological and internistic wards. The chosen wards differ significantly in their specificity. In all cases the presence of o vanA and lack of vanB, vanD, vanE and vanG genes and were found. Strains were compared by using RFLP-PFGE, the reference method for molecular typing of VRE. One group including fourteen strains showing similarity higher than 79.5% was distinguished. This group was divided into subgroups. The greatest similarity was found among strains from patients of intensive therapy ward. Two subgroups of strains showing similarity more than 93.3%, of four strains each were identified. The similarity between these two subgroups was 79.5%. Most strains from other two wards showed less than 79.5% similarity and they could be recognised as not related. Only one strain from internal ward and two strains from urologic ward were similar in 82.1 - 86.4% to one of subgroups of strains originated from intensive therapy.  相似文献   

20.
Abstract. Grid maps are used as a basic vegetation data base in Japan; they are simplified from vector-based vegetation maps. We estimated the frequency error or lack of information corresponding to reduced resolution and examined the reliable limits of this data base. We produced 10 grid maps on five different scales from 50 m to 1000 m using two different methods using both the whole cell (W-method) and only the central circle (C-method) from a vegetation map at scale 1: 25 000. We found that patches larger than the area of a cell on a vector-based map could be kept almost certainly on any map, but many patches of less than the cell size were lost. The number of missing patches with the C-method is fewer at every scale than those with the W-method. Though the value of Morisita's Cλ (p) index showed that the similarity with the original map was high - from the 50-m to the 200-m resolution - it was increasingly lower on the 400-m and 1000-m grid maps. The values of the Shannon index on the original map, 50-m and 100-m grid maps were not different, but they decreased from the 200-m to 1000-m grid maps. Because the vegetation data base of the Japanese Environment Agency used a 1000-m C-method grid map, we found that much information on patches less than 100 ha had disappeared. Information about dominant vegetation or large patches is almost accurate in this data base.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号