首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Definition and identification of homology domains   总被引:3,自引:0,他引:3  
A method is described for identifying and evaluating regionsof significant similarity between two sequences. The notionof a ‘homology domain’ is employed which definesthe boundaries of a region of sequence homology containing noinsertions or deletions. The relative significance of differentpotential homology domains is evaluated using a non-linear similarityscore related to the probability of finding the observed levelof similarity in the region by chance. The sensitivity of themethod is demonstrated by simulating the evolution of homologydomains and applying the method to their detection. Severalexamples of the use of homology domain identification are given. Received on July 29, 1987; accepted on November 15, 1987  相似文献   

2.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

3.
Arthur M. Lesk 《Proteins》1998,33(3):320-328
In analysis, comparison and classification of conformations of proteins, a common computational task involves extractions of similar substructures. Structural comparisons are usually based on either of two measures of similarity: the root-mean-square (r.m.s.) deviation upon optimal superposition, or the maximal element of the difference distance matrix. The analysis presented here clarifies the relationships between different measures of structural similarity, and can provide a basis for developing algorithms and software to extract all maximal common well-fitting substructures from proteins. Given atomic coordinates of two proteins, many methods have been described for extracting some substantial (if not provably maximal) common substructure with low r.m.s. deviation. This is a relatively easy task compared with the problem addressed here, i.e., that of finding all common substructures with r.m.s. deviation less than a prespecified threshold. The combinatorial problems associated with similar subset extraction are more tractable if expressed in terms of the maximal element of the difference distance matrix than in terms of the r.m.s. deviation. However, it has been difficult to correlate these alternative measures of structural similarity. The purpose of this article is to make this connection. We first introduce a third measure of structural similarity: the maximum distance between corresponding pairs of points after superposition to minimize this value. This corresponds to fitting in the Chebyshev norm. Properties of Chebyshev superposition are derived. We describe relationships between the r.m.s. and minimax (Chebyshev) deviations upon optimal superposition, and between the Chebyshev deviation and the maximal element of the difference distance matrix. Combining these produces a relationship between the r.m.s. deviation upon optimal superposition and the maximal element of the difference distance matrix. Based on these results, we can apply algorithms and software for finding subsets of the difference distance matrix for which all elements are less than a specified bound, either to select only subsets for which the r.m.s.deviation is less than or equal to a specified threshold, or to select subsets that include all subsets for which the r.m.s. deviation is less than or equal to a threshold. Proteins 33:320–328, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

4.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

5.
We describe software for aligning protein or nucleic acid sequencesbased on the concept of match density. This method is especiallyuseful for locating regions of short similarity between twolonger sequences which may be largely dissimilar (e.g. locatingactive site regions in distantly related proteins). Our softwareis able to identify biologically interesting similarities betweentwo sub-regions because it allows the user to control the matchingparameters and the manner in which local alignments are selectedfor display. Furthermore, the collection and ranking of alignmentsfor display uses a novel, highly efficient algorithm. We illustratethese features with several examples. In addition, we show thatthis tool can be used to find a new conserved sequence in severalviral DNA polymerases, which, we suggest, occurs at a functionallyimportant enzymatic site. Received on August 17, 1987; accepted on November 17, 1987  相似文献   

6.
7.
8.
We describe the changes in the floral assemblage in a salt marsh after reconnection to estuarine tidal inundation. The Elk River marsh in Grays Harbor, Washington was opened to tidal flushing in 1987 after being diked for approximately 70 years. The freshwater pasture assemblage dominated by Phalarais arundinacea (reed canary grass) converted to low salt marsh vegetation within 5 years, with the major flux in species occurring between years 1 and 4. The system continued to develop through the 11‐year post‐breach monitoring period, although change after year 6 was slower than in previous years. The assemblage resembles a low salt marsh community dominated by Distichlis spicata (salt grass) and Salicornia virginica (pickleweed). Because of subsidence of the system during the period of breaching, the restored system remains substantially different from the Deschamsia cespitosa (tufted hairgrass)‐dominated reference marsh. Use of a similarity index to compare between years and also between reference and restored marshes in the same year revealed that similarity in floral composition between year 0 and subsequent years decreased with time. However, there was a period of dramatic dissimilarity during years 1 to 3 when the system was rapidly changing from a freshwater to estuarine condition. Similarity values between the reference and restored system generally increased with time. Somewhat surprisingly the reference marsh showed considerable between‐year variation in similarity, which indicated substantial year‐to‐year variability in species composition. Based on accretion rate data from previous studies we predict that full recovery of the system would take between 75 and 150 years.  相似文献   

9.
Environment, dispersal and patterns of species similarity   总被引:2,自引:0,他引:2  
Aim The aim of this paper is to evaluate the combined effects of geographical distance and environmental distance on patterns of species similarity (similarity in species composition between sites), and to identify factors affecting the rate of decay in species similarity with each type of distance. Location Israel. Methods Data on species composition of land snails and land birds were recorded in 27 sites of 1 × 1 km scattered across a rainfall gradient in Israel. Matrices of similarity in species composition between all pairs of sites were computed and analysed with respect to corresponding matrices of geographical distance and rainfall distance (defined as the difference in mean annual rainfall between sites, and used as a measure of environmental distance). Mantel tests were applied to determine the correlation between species similarity and each type of distance. Factors affecting the decay in species similarity were investigated by comparing different subsets of the data using randomization tests. Results Both rainfall distance and geographical distance had negative effects on species similarity. The effect of rainfall distance was statistically significant even after controlling for differences in geographical distance, and vice versa. The per‐unit effect of rainfall distance on species similarity decreased with increasing geographical distance, indicating that the two types of distances interacted in determining the similarity in species composition. Snails showed a higher rate of decay in species similarity with geographical distance than birds, and large snails showed a higher rate of decay than small snails, which are better passive dispersers. The per‐unit effects of both rainfall distance and geographical distance on species similarity were higher in the desert region than in the Mediterranean region. Analyses focusing on a grain size of 10 × 10 m showed a lower similarity in species composition and a lower rate of decay in species similarity with rainfall distance than analyses carried out at a grain size of 1 × 1 km. Main conclusions Patterns of similarity in species composition are influenced by the combined effects of environmental variation, the position of the area along environmental gradients, the dispersal properties of the component species, and the scale (both spatial extent and grain size) at which the patterns are examined.  相似文献   

10.
The ATP:D-fructose-6-phosphate 1-phosphotransferase (EC 2.7.1.11 [EC] )isoenzymes from cucumber seeds were separated and purified.The calculated molecular weights of the two isoenzymes (approximately180,000) are similar and the isoenzymes are probably hetro-tetramers.The purified isoenzymes contained three polypeptides of 53.3,41.5 and 39.0 kDa for the plastid and 47.2, 42.4 and 40.4 forthe cytosolic isoenzyme, respectively. The purified phosphofructokinaseisoenzymes were used as the antigen for the production of polyclonalantibodies in rabbits. The obtained antisera clearly indicatedthat there is no immunological similarity between the two isoenzymes.The results also show that the phosphofructokinase isoenzymesin cucumber are not merely different stages of association ofthe same protein. (Received June 29, 1987; Accepted October 21, 1987)  相似文献   

11.
An algorithm for comparing multiple RNA secondary structures   总被引:1,自引:0,他引:1  
A new distributed computational procedure is presented for rapidlydetermining the similarity of multiple conformations of RNAsecondary structures. A data abstraction scheme is utilizedto reduce the quantity of data that must be handled to determinethe degree of similarity among multiple structures. The methodhas been used to compare 200 structures with easy visualizationof both those structures and substructures that are similarand those that are vastly different. It has the capability ofprocessing many more conformations as a function of researchrequirements. The algorithm is described as well as some suggestionsfor future uses and extensions. Received on October 29, 1987; accepted on May 4, 1988  相似文献   

12.
We have compared the sequence of the capsid polypeptide of the Saccharomyces cerevisiae double-stranded RNA virus, ScV, with those of the picornaviruses. A central region of 245 amino acids in the ScV capsid polypeptide of 680 amino acids has significant similarity to the picornavirus VP3. This similarity is more extensive than that already noted for the alphavirus capsid polypeptide and the picornavirus VP3 (Fuller, S.D. and Argos, P, EMBO J. 6, 1099, 1987). Together with the similarity between the ScV RNA polymerase and the picornavirus RNA polymerases, this result implies an evolutionary relationship between a simple double-stranded RNA virus of fungi and the small plus strand RNA animal viruses.  相似文献   

13.
Given a set of related proteins, two important problems in biology are the inference of protein subsets such that members of one subset share a common function and the identification of protein regions that possess functional significance. The former is typically approached by hierarchical bottom-up clustering based on pairwise sequence similarity and various linkage rules. The latter is typically approached in a supervised manner, based on global multiple sequence alignment. However, the two problems are inextricably linked, since functional subsets are usually characterized by distinctive functional regions. This paper introduces CASTOR, an automatic and unsupervised system that addresses both problems simultaneously and efficiently. It identifies protein regions that are likely to have functional significance by discovering and refining statistically significant motifs. It infers likely functional protein subsets and their relationships based on the presence of the discovered motifs in a top-down and recursive manner, allowing the identification of both hierarchical and nonhierarchical subset relationships. This is, to our knowledge, the first system that approaches both problems simultaneously in a top-down, systematic manner. CASTOR's performance is evaluated against the G-protein coupled receptor superfamily. The identified protein regions lead to a taxonomical organization of this superfamily that is in remarkable agreement with a biologically motivated one and which outperforms those produced by bottom-up clustering methods. We also find that conventional hierarchical representations may fail to accurately describe the complexity of evolutionary development responsible for the final organization of a complex protein family. In particular, many functional relationships governing distant subfamilies of such a protein family may not be represented hierarchically.  相似文献   

14.
There is a strong correlation between marriage system and wealth inheritance pattern across societies (Hartung 1982); as the degree of polygyny increases, so too does the degree of male bias in inheritance. In this paper, we reevaluate this pattern using a new technique in cross-cultural analyses that effectively controls for the nonindependence of cultures (Galton's problem) through the identification of independent instances of cultural change (Mace and Pagel 1994). First, we produce cultural phylogenetic trees for the societies under study, from phylogenies previously constructed on the basis of linguistic similarity (Ruhlen 1987). Then, following standard methods for the analysis of discrete characters on phylogenetic trees, we use parsimony to determine the ancestral condition of both marriage and inheritance, and subsequently tally the number of independent instances of cultural change in each trait. The results show that transitions to polygyny are much more commonly associated with male-biased inheritance than are transitions to monogamy across human societies in our sample. They illustrate how the degree of change in the evolution of these traits differs considerably between divergent cultural groups. The advantages of this technique are discussed.  相似文献   

15.
The paper deals with inconsistencies of composite sustainability indicators and their different subsets (economic, environmental, social, and corporate governance). Corporate sustainability performance is usually highly nonlinear, vague, partially inconsistent and multidimensional. The resulting models are often oversimplified. The key reason is an information shortage which eliminates the unsophisticated applications of classical statistical methods. Numbers are accurate and information intensive. Verbal quantifications are less accurate and therefore not that information intensive. Fuzzy sets and fuzzy reasoning are used to make verbal quantifiers suitable for computer applications. A fuzzy similarity graph is defined. A team of experts identified 17 relevant variables (e.g. Environmental costs, Occupational diseases, Number of complaints received from stakeholders) and 12 company data sets are available. Each company is presented as a fuzzy conditional statement. A set of fuzzy pairwise similarities is generated and used to evaluate five similarity graphs: a Total Graph (based on all 17 variables) and graphs based on relevant specific subsets of variables, Economic, Environmental, Social and Corporate Governance graphs. The topologies of these graphs are significantly different. No prior knowledge of fuzzy reasoning is required.  相似文献   

16.
Profile analysis measures the similarity between a target sequenceand a group of aligned sequences (the probe). The probe sequencesare used to produce a position-specific scoring table (the profile)that can be aligned with any sequence (the target) using standarddynamic programming methods. We are developing a library ofprofiles, each describing a different structural motif. Thisallows any target sequence to be rapidly scanned for the presenceof structural motifs. Levels of significance for the comparisonof target sequences with the profile are determined in advance,permitting an objective decision to be made as to whether aprotein is likely to possess a structural motif. Received on July 17, 1987; accepted on January 4, 1988  相似文献   

17.
Sikic K  Carugo O 《Bioinformation》2010,5(6):234-239
Non-redundant protein datasets are of utmost importance in bioinformatics. Constructing such datasets means removing protein sequences that overreach certain similarity thresholds. Several programs such as 'Decrease redundancy', 'cd-hit', 'Pisces', 'BlastClust' and 'SkipRedundant' are available. The issue that we focus on here is to what extent the non-redundant datasets produced by different programs are similar to each other. A systematic comparison of the features and of the outputs of these programs, by using subsets of the UniProt database, was performed and is described here. The results show high level of overlap between non-redundant datasets obtained with the same program fed with the same initial dataset but different percentage of identity threshold, and moderate levels of similarity between results obtained with different programs fed with the same initial dataset and the same percentage of identity threshold. We must be aware that some differences may arise and the use of more than one computer application is advisable.  相似文献   

18.
T-cell subsets were studied by fluorescence-activated cell sorter analysis in 57 feline immunodeficiency virus (FIV)-seropositive cats with naturally acquired FIV infection to see whether CD4(+)-CD8+ alterations were comparable to those observed in human immunodeficiency virus-infected patients. CD4+ values were decreased and CD8+ values were increased. The CD4+/CD8+ ratio was reduced to 1.6, compared with 3.3 in 33 FIV-seronegative control cats. Variance analysis of data showed a significant influence of FIV seropositivity, sex, and spaying of female cats on CD4+ values. CD8+ values were significantly influenced by FIV seropositivity, age, and breed. These findings indicate a similarity between FIV and human immunodeficiency virus infections, as far as alterations of T-cell subsets are concerned.  相似文献   

19.
A method of interfacing sequence similarity search softwarewith the fast sequence retrieval system ACNUC is described.The method is written in FORTRAN 77 and is straightforward toimplement because no textprocessing code is required —a minimum of 12 extra lines of FORTRAN provided the interfacefor most applications. The method is also efficient, since sequencesare located by simple indexing techniques, with no linear searchesof large database files necessary. Received on November 20, 1986; accepted on January 8, 1987  相似文献   

20.
A rapid method of protein structure alignment   总被引:5,自引:0,他引:5  
A reduction in the time required to compare two protein structures has been achieved for a previously developed structure alignment method, by reducing the number of residue pair comparisons which must be performed between the two structures. Subsets of residue pairs are selected by an iterative procedure. Initially, selection is based on similarities in solvent accessible surface areas or torsional angles or a combination of both properties, giving subsets containing approximately 2% of the total number of residue pairs. Using these subsets, a rough comparison of the two structures is generated by the structural alignment program. The information returned from this can be used to identify more accurately topologically equivalent residues in the two proteins, thus enabling a new and much smaller subset (less than 0.2% of the total number of residue pairs) to be selected. The process of iterative refinement of the residue pair subsets is repeated once more, when in 95% of the structure comparisons tested, the correct alignment of the proteins was obtained. Times required to compare the structures using the refined subsets are insignificant compared to the initial comparison, so that considerable increases in speed are possible. The method was tested on two groups of proteins, a set of remotely related alpha/beta nucleotide proteins and the variable and constant domains of the immunoglobulins. Increases in speed ranging from 50-fold to greater than 150-fold were obtained depending on the degree of similarity of the two structures. In some comparisons the alignment was improved due to the reduction in noise obtained by comparing mainly equivalent residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号