首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 958 毫秒
1.
A pair of proteins is defined to be related by a circular permutation if the N-terminal region of one protein has significant sequence similarity to the C-terminal of the other and vice versa. To detect pairs of proteins that might be related by circular permutation, we implemented a procedure based on a combination of a fast screening algorithm that we had designed and manual verification of candidate pairs. The screening algorithm is a variation of a dynamic programming string matching algorithm, in which one of the sequences is doubled. This algorithm, although not guaranteed to identify all cases of circular permutation, is a good first indicator of protein pairs related by permutation events. The candidate pairs were further validated first by application of an exhaustive string matching algorithm and then by manual inspection using the dotplot visual tool. Screening the whole Swissprot database, a total of 25 independent protein pairs were identified. These cases are presented here, divided into three categories depending on the level of functional similarity of the related proteins. To validate our approach and to confirm further the small number of circularly permuted protein pairs, a systematic search for cases of circular permutation was carried out in the Pfam database of protein domains. Even with this more inclusive definition of a circular permutation, only seven additional candidates were found. None of these fitted our original definition of circular permutations. The small number of cases of circular permutation suggests that there is no mechanism of local genetic manipulation that can induce circular permutations; most examples observed seem to result from fusion of functional units.  相似文献   

2.
An algorithm to enumerate sorting reversals for signed permutations.   总被引:1,自引:0,他引:1  
The rearrangement distance between single-chromosome genomes can be estimated as the minimum number of inversions required to transform the gene ordering observed in one into that observed in the other. This measure, known as "inversion distance," can be computed as the reversal distance between signed permutations. During the past decade, much progress has been made both on the problem of computing reversal distance and on the related problem of finding a minimum-length sequence of reversals, which is known as "sorting by reversals." For most problem instances, however, many minimum-length sequences of reversals exist, and in the absence of auxiliary information, no one is of greater value than the others. The problem of finding all minimum-length sequences of reversals is thus a natural generalization of sorting by reversals, yet it has received little attention. This problem reduces easily to the problem of finding all "sorting reversals" of one permutation with respect to another - that is, all reversals rho such that, if rho is applied to one permutation, then the reversal distance of that permutation from the other is decreased. In this paper, an efficient algorithm is derived to solve the problem of finding all sorting reversals, and experimental results are presented indicating that, while the new algorithm does not represent a significant improvement in asymptotic terms (it takes O(n(3)) time, for permutations of size n; the problem can now be solved by brute force in Theta(n(3)) time), it performs dramatically better in practice than the best known alternative. An implementation of the algorithm is available at www.cse.ucsc.edu/~acs.  相似文献   

3.
Most molecular analyses, including phylogenetic inference, are based on sequence alignments. We present an algorithm that estimates relatedness between biomolecules without the requirement of sequence alignment by using a protein frequency matrix that is reduced by singular value decomposition (SVD), in a latent semantic index information retrieval system. Two databases were used: one with 832 proteins from 13 mitochondrial gene families and another composed of 1000 sequences from nine types of proteins retrieved from GenBank. Firstly, 208 sequences from the first database and 200 from the second were randomly selected and compared using edit distance between each pair of sequences and respective cosines and Euclidean distances from SVD. Correlation between cosine and edit distance was -0.32 (P < 0.01) and between Euclidean distance and edit distance was +0.70 (P < 0.01). In order to check the ability of SVD in classifying sequences according to their categories, we used a sample of 202 sequences from the 13 gene families as queries (test set), and the other proteins (630) were used to generate the frequency matrix (training set). The classification algorithm applies a voting scheme based on the five most similar sequences with each query. With a 3-peptide frequency matrix, all 202 queries were correctly classified (accuracy = 100%). This algorithm is very attractive, because sequence alignments are neither generated nor required. In order to achieve results similar to those obtained with edit distance analysis, we recommend that Euclidean distance be used as a similarity measure for protein sequences in latent semantic indexing methods.  相似文献   

4.
The rearrangement or permutation of protein substructures is an important mode of divergence. Recent work explored one possible underlying mechanism called permutation-by-duplication, which produces special forms of motif rearrangements called circular permutations. Permutation-by-duplication, involving gene duplication, fusion and truncation, can produce fully functional intermediate proteins and thus represents a feasible mechanism of protein evolution. In spite of this, circular permutations are relatively rare and we discuss possible reasons for their existence.  相似文献   

5.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

6.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

7.
The problem of sorting by transpositions asks for a sequence of adjacent interval exchanges that sorts a permutation and is of the shortest possible length. The distance of the permutation is defined as the length of such a sequence. Despite the apparently intuitive nature of this problem, introduced in 1995 by Bafna and Pevzner, the complexity of both finding an optimal sequence and computing the distance remains open today. In this paper, we establish connections between two different graph representations of permutations, which allows us to compute the distance of a few nontrivial classes of permutations in linear time and space, bypassing the use of any graph structure. By showing that every permutation can be obtained from one of these classes, we prove a new tight upper bound on the transposition distance. Finally, we give improved bounds on some other families of permutations and prove formulas for computing the exact distance of other classes of permutations, again in polynomial time  相似文献   

8.
There have been many studies about the effect of circular permutation on the transition state/folding nucleus of proteins, with sometimes conflicting conclusions from different proteins and permutations. To clarify this important issue, we have studied two circular permutations of a lattice protein model with side-chains. Both permuted sequences have essentially the same native state as the original (wild-type) sequence. Circular permutant 1 cuts at the folding nucleus of the wild-type sequence. As a result, the permutant has a drastically different nucleus and folds more slowly than wild-type. In contrast, circular permutant 2 involves an incision at a site unstructured in the wild-type transition state, and the wild-type nucleus is largely retained in the permutant. In addition, permutant 2 displays both two-state and multi-state folding, with a native-like intermediate state occasionally populated. Neither the wild-type nor permutant 1 has a similar intermediate, and both fold in an apparently two-state manner. Surprisingly, permutant 2 folds at a rate identical with that of the wild-type. The intermediate in permutant 2 is stabilised by native and non-native interactions, and cannot be classified simply as on or off-pathway. So we advise caution in attributing experimental data to on or off-pathway intermediates. Finally, our work illuminates the results on alpha-spectrin SH3, chymotrypsin inhibitor 2 and beta-lactoglobulin, and supports a key assumption in the experimental efforts to locate potential nucleation sites of real proteins via circular permutations.  相似文献   

9.
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.  相似文献   

10.

Background  

Identifying structurally similar proteins with different chain topologies can aid studies in homology modeling, protein folding, protein design, and protein evolution. These include circular permuted protein structures, and the more general cases of non-cyclic permutations between similar structures, which are related by non-topological rearrangement beyond circular permutation. We present a method based on an approximation algorithm that finds sequence-order independent structural alignments that are close to optimal. We formulate the structural alignment problem as a special case of the maximum-weight independent set problem, and solve this computationally intensive problem approximately by iteratively solving relaxations of a corresponding integer programming problem. The resulting structural alignment is sequence order independent. Our method is also insensitive to insertions, deletions, and gaps.  相似文献   

11.
Circular permutations of genes during molecular evolution often are regarded as elusive, although a simple model can explain these rearrangements. The model assumes that first a gene duplication of the precursor gene occurs in such a way that both genes become fused in frame, leading to a tandem protein. After generation of a new start codon within the 5′ part of the tandem gene and a stop at an equivalent position in the 3′ part of the gene, a protein is encoded that represents a perfect circular permutation of the precursor gene product. The model is illustrated here by the molecular evolution of adenine-N6 DNA methyltransferases. β- and γ-type enzymes of this family can be interconverted by a single circular permutation event. Interestingly, tandem proteins, proposed as evolutionary intermediates during circular permutation, can be directly observed in the case of adenine methyltransferases, because some enzymes belonging to type IIS, like the FokI methyltransferase, are built up by two fused enzymes, both of which are active independently of each other. The mechanism for circular permutation illustrated here is very easy and applicable to every protein. Thus, circular permutation can be regarded as a normal process in molecular evolution and a changed order of conserved amino acid motifs should not be interpreted to argue against divergent evolution. Received: 17 November 1998 / Accepted: 19 February 1999  相似文献   

12.

Background

One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed permutations of the form \(\pi =(\pi _1 \pi _2 \ldots \pi _n)\), and in that case we can consider without loss of generality that one of them is the identity permutation \(\iota _n =(1 2 \ldots n)\), and that we just need to sort the other (i.e., transform it into \(\iota _n\)). The most studied genome rearrangement events are reversals, where a segment of the genome is reversed and reincorporated at the same location; and transpositions, where two consecutive segments are exchanged. Many variants, e.g., combining different types of (possibly constrained) rearrangements, have been proposed in the literature. One of them considers that the number of genes involved, in a reversal or a transposition, is never greater than two, which is known as the problem of sorting by super short operations (or SSOs).

Results and conclusions

All problems considering SSOs in permutations have been shown to be in \(\mathsf {P}\), except for one, namely sorting signed circular permutations by super short reversals and super short transpositions. Here we fill this gap by introducing a new graph structure called cyclic permutation graph and providing a series of intermediate results, which allows us to design a polynomial algorithm for sorting signed circular permutations by super short reversals and super short transpositions.
  相似文献   

13.
Methods for rapid and reliable design and structure prediction of linker loops would facilitate a variety of protein engineering applications. Circular permutation, in which the existing termini of a protein are linked by the polypeptide chain and new termini are created, is one such application that has been employed for decreasing proteolytic susceptibility and other functional purposes. The length and sequence of the linker can impact the expression level, solubility, structure and function of the permuted variants. Hence it is desirable to achieve atomic‐level accuracy in linker design. Here, we describe the use of RosettaRemodel for design and structure prediction of circular permutation linkers on a model protein. A crystal structure of one of the permuted variants confirmed the accuracy of the computational prediction, where the all‐atom rmsd of the linker region was 0.89 Å between the model and the crystal structure. This result suggests that RosettaRemodel may be generally useful for the design and structure prediction of protein loop regions for circular permutations or other structure‐function manipulations.  相似文献   

14.
In comparative genomics, gene order data is often modeled as signed permutations. A classical problem for genome comparison is to detect common intervals in permutations, that is, genes that are colocalized in several species, indicating that they remained grouped during evolution. A second largely studied problem related to gene order is to compute a minimum scenario of reversals that transforms a signed permutation into another. Several studies began to mix the two problems and it was observed that their results are not always compatible: Often, parsimonious scenarios of reversals break common intervals. If a scenario does not break any common interval, it is called perfect. In two recent studies, Berard et al. defined a class of permutations for which building a perfect scenario of reversals sorting a permutation was achieved in polynomial time and stated as an open question whether it is possible to decide, given a permutation, if there exists a minimum scenario of reversals that is perfect. In this paper, we give a solution to this problem and prove that this widens the class of permutations addressed by the aforementioned studies. We implemented and tested this algorithm on gene order data of chromosomes from several mammal species and we compared it to other methods. The algorithm helps to choose among several possible scenarios of reversals and indicates that the minimum scenario of reversals is not always the most plausible  相似文献   

15.
Generating diverse protein libraries that contain improved variants at a sufficiently high frequency is critical for improving the properties of proteins using directed evolution. Many studies have illustrated how random mutagenesis, cassette mutagenesis, DNA shuffling and similar approaches are effective diversity generating methods for directed evolution. Very few studies have explored random circular permutation, the intramolecular relocation of the N- and C-termini of a protein, as a diversity-generating step for directed evolution. We subjected a library of random circular permutations of TEM-1 β-lactamase to selections on increasing concentrations of a variety of β-lactam antibiotics including cefotaxime. We identified two circularly permuted variants that conferred elevated resistance to cefotaxime but decreased resistance to other antibiotics. These variants were circularly permuted in the Ω-loop proximal to the active site. Remarkably, one variant was circularly permuted such that the key catalytic residue Glu166 was located at the N-terminus of the mature protein.  相似文献   

16.
Hannenhalli and Pevzner gave the first polynomial-time algorithm for computing the inversion distance between two signed permutations, as part of the larger task of determining the shortest sequence of inversions needed to transform one permutation into the other. Their algorithm (restricted to distance calculation) proceeds in two stages: in the first stage, the overlap graph induced by the permutation is decomposed into connected components; then, in the second stage, certain graph structures (hurdles and others) are identified. Berman and Hannenhalli avoided the explicit computation of the overlap graph and gave an O(nalpha(n)) algorithm, based on a Union-Find structure, to find its connected components, where alpha is the inverse Ackerman function. Since for all practical purposes alpha(n) is a constant no larger than four, this algorithm has been the fastest practical algorithm to date. In this paper, we present a new linear-time algorithm for computing the connected components, which is more efficient than that of Berman and Hannenhalli in both theory and practice. Our algorithm uses only a stack and is very easy to implement. We give the results of computational experiments over a large range of permutation pairs produced through simulated evolution; our experiments show a speed-up by a factor of 2 to 5 in the computation of the connected components and by a factor of 1.3 to 2 in the overall distance computation.  相似文献   

17.
18.
In the study of genome rearrangement, the block-interchanges have been proposed recently as a new kind of global rearrangement events affecting a genome by swapping two nonintersecting segments of any length. The so-called block-interchange distance problem, which is equivalent to the sorting-by-block-interchange problem, is to find a minimum series of block-interchanges for transforming one chromosome into another. In this paper, we study this problem by considering the circular chromosomes and propose a Omicron(deltan) time algorithm for solving it by making use of permutation groups in algebra, where n is the length of the circular chromosome and delta is the minimum number of block-interchanges required for the transformation, which can be calculated in Omicron(n) time in advance. Moreover, we obtain analogous results by extending our algorithm to linear chromosomes. Finally, we have implemented our algorithm and applied it to the circular genomic sequences of three human vibrio pathogens for predicting their evolutionary relationships. Consequently, our experimental results coincide with the previous ones obtained by others using a different comparative genomics approach, which implies that the block-interchange events seem to play a significant role in the evolution of vibrio species.  相似文献   

19.
The kinetics of disulfide-coupled folding and unfolding of four circularly permuted forms of bovine pancreatic trypsin inhibitor (BPTI) were studied and compared with previously published results for both wild-type BPTI and a cyclized form. Each of the permuted proteins was found to be less stable than either the wild-type or circular proteins, by 3-8 kcal/mole. These stability differences were used to estimate effective concentrations of the chain termini in the native proteins, which were 1 mM for the wild-type protein and 2.5 to 4000 M for the permuted forms. The circular permutations increased the rates of unfolding and caused a variety of effects on the kinetics of refolding. For two of the proteins, the rates of a direct disulfide-formation pathway were dramatically increased, making this process as fast or faster than the competing disulfide rearrangement mechanism that predominates in the folding of the wild-type protein. These two permutations break the covalent connectivity among the beta-strands of the native protein, and removal of these constraints appears to facilitate direct formation and reduction of nearby disulfides that are buried in the folded structure. The effects on folding kinetics and mechanism do not appear to be correlated with relative contact order, a measure of overall topological complexity. These observations are consistent with the results of other recent experimental and computational studies suggesting that circular permutation may generally influence folding mechanisms by favoring or disfavoring specific interactions that promote alternative pathways, rather than through effects on the overall topology of the native protein.  相似文献   

20.
A short swap is an operation on a permutation that switches two elements that have at most one element between them. This paper investigates the problem of finding a minimum-length sorting sequence of short swaps for a given permutation. A polynomial-time 2-approximation algorithm for this problem is presented, and bounds for the short-swap diameter (the length of the longest minimum sorting sequence among all permutations of a given length) are also obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号