首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bittker JA  Le BV  Liu DR 《Nature biotechnology》2002,20(10):1024-1029
We have developed a simple method for exploring nucleic acid sequence space by nonhomologous random recombination (NRR) that enables DNA fragments to randomly recombine in a length-controlled manner without the need for sequence homology. We compared the results of using NRR and error-prone PCR to evolve DNA aptamers that bind streptavidin. Starting with two parental sequences of modest avidin affinity, evolution using NRR resulted in aptamers with 15- to 20-fold higher affinity than the highest-affinity aptamers evolved using error-prone PCR, and 27- or 46-fold higher affinities than parental sequences derived using systematic evolution of ligands by exponential enrichment (SELEX). NRR also facilitates the identification of functional regions within evolved sequences. Inspection of a small number of NRR-evolved clones identified a 40-base DNA sequence, present in multiple copies in each clone, that binds streptavidin. Our findings suggest that NRR may enhance the effectiveness of nucleic acid evolution and the ease of identifying structure-activity relationships among evolved sequences.  相似文献   

2.
Directed evolution experiments rely on the cyclical application of mutagenesis, screening and amplification in a test tube. They have led to the creation of novel proteins for a wide range of applications. However, directed evolution currently requires an uncertain, typically large, number of labor intensive and expensive experimental cycles before proteins with improved function are identified. This paper introduces predictive models for quantifying the outcome of the experiments aiding in the setup of directed evolution for maximizing the chances of obtaining DNA sequences encoding enzymes with improved activities. Two methods of DNA manipulation are analysed: error-prone PCR and DNA recombination. Error-prone PCR is a DNA replication process that intentionally introduces copying errors by imposing mutagenic reaction conditions. The proposed model calculates the probability of producing a specific nucleotide sequence after a number of PCR cycles. DNA recombination methods rely on the mixing and concatenation of genetic material from a number of parent sequences. This paper focuses on modeling a specific DNA recombination protocol, DNA shuffling. Three aspects of the DNA shuffling procedure are modeled: the fragment size distribution after random fragmentation by DNase I, the assembly of DNA fragments, and the probability of assembling specific sequences or combinations of mutations. Results obtained with the proposed models compare favorably with experimental data.  相似文献   

3.
The fraction of proteins that retain wild-type function after mutation has long been observed to decline exponentially as the average number of mutations per gene increases. Recently, several groups have used error-prone polymerase chain reactions (PCR) to generate libraries with 15 to 30 mutations per gene, on average, and have reported that orders of magnitude more proteins retain function than would be expected from the low-mutation-rate trend. Proteins with improved or novel function were isolated disproportionately from these high-error-rate libraries, leading to claims that high mutation rates unlock regions of sequence space that are enriched in positively coupled mutations. Here, we show experimentally that error-prone PCR produces a broader non-Poisson distribution of mutations consistent with a detailed model of PCR. As error rates increase, this distribution leads directly to the observed excesses in functional clones. We then show that while very low mutation rates result in many functional sequences, only a small number are unique. By contrast, very high mutation rates produce mostly unique sequences, but few retain function. Thus an optimal mutation rate exists that balances uniqueness and retention of function. Overall, high-error-rate mutagenesis libraries are enriched in improved sequences because they contain more unique, functional clones. Our findings demonstrate how optimal error-prone PCR mutation rates may be calculated, and indicate that "optimal" rates depend on both the protein and the mutagenesis protocol.  相似文献   

4.
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.  相似文献   

5.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

6.
The persistence of life requires populations to adapt at a rate commensurate with the dynamics of their environment. Successful populations that inhabit highly variable environments have evolved mechanisms to increase the likelihood of successful adaptation. We introduce a 64 × 64 matrix to quantify base-specific mutation potential, analyzing four different replicative systems, error-prone PCR, mouse antibodies, a nematode, and Drosophila. Mutational tendencies are correlated with the structural evolution of proteins. In systems under strong selective pressure, mutational biases are shown to favor the adaptive search of space, either by base mutation or by recombination. Such adaptability is discussed within the context of the genetic code at the levels of replication and codon usage.Supplementary material to this paper is available in electronic form.Reviewing Editor: Dr. Edward Trifonov  相似文献   

7.
Abstract

Number of naturally occurring primary sequences of proteins is an infinitesimally small subset of the possible number of primary sequences that can be synthesized using 20 amino acids. Prevailing views ascribe this to slow and incremental mutational/selection evolutionary mechanisms. However, considering the large number of avenues available in form of diversity of emerging/evolving and/or disappearing living systems for exploring the primary sequence space over the evolutionary time scale of ~3.5 billion years, this remains a conjecture. Therefore, to investigate primary sequence space limitations, we carried out a systematic study for finding primary sequences absent in nature. We report the discovery of the smallest peptide sequence “Cysteine-Glutamine-Tryptophan-Tryptophan” that is not found in over half-a-million curated protein sequences in the Uniprot (Swiss-Prot) database. Additionally, we report a library of 83605 pentapeptides that are not found in any of the known protein sequences. Compositional analyses of these absent primary sequences yield a remarkably strong power relationship between the percentage occurrence of individual amino acids in all known protein sequences and their respective frequency of occurrence in the absent peptides, regardless of their specific position in the sequences. If random evolutionary mechanisms were responsible for limitations to the primary sequence space, then one would not expect any relationship between compositions of available and absent primary sequences. Thus, we conclusively show that stoichiometric constraints on amino acids limit the primary sequence space of proteins in nature. We discuss the possibly profound implications of our findings in both evolutionary and synthetic biology.

Communicated by Ramaswamy H. Sarma  相似文献   

8.
We are interested in how intragenic recombination contributes to the evolution of proteins and how this mechanism complements and enhances the diversity generated by random mutation. Experiments have revealed that proteins are highly tolerant to recombination with homologous sequences (mutation by recombination is conservative); more surprisingly, they have also shown that homologous sequence fragments make largely additive contributions to biophysical properties such as stability. Here, we develop a random field model to describe the statistical features of the subset of protein space accessible by recombination, which we refer to as the recombinational landscape. This model shows quantitative agreement with experimental results compiled from eight libraries of proteins that were generated by recombining gene fragments from homologous proteins. The model reveals a recombinational landscape that is highly enriched in functional sequences, with properties dominated by a large-scale additive structure. It also quantifies the relative contributions of parent sequence identity, crossover locations, and protein fold to the tolerance of proteins to recombination. Intragenic recombination explores a unique subset of sequence space that promotes rapid molecular diversification and functional adaptation.  相似文献   

9.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

10.
In nature similar protein folds accommodate distant sequences and support diverse functions. This observation coupled with the recognition that proteins can tolerate many homologous substitutions inspires protein engineers to use recombination to search for new functions within sequences encoding structurally related molecules. These searches have led to proteins with novel activities, diversified specificities and greater stabilities. Computational methods that exploit structural and evolutionary information are being used to design highly mutated yet still natively folded chimeric proteins and protein libraries.  相似文献   

11.
蛋白质定向进化的研究进展   总被引:1,自引:0,他引:1  
定向进化是改造蛋白质分子的一种有效的新策略。主要是在实验室里模拟自然进化过程,通过由易错PCR、致突变菌株诱变等方法对编码蛋白质的基因进行随机诱变,由DNA改组、随机引导重组和交错延伸等方法进行突变基因体外重组,设计高通量筛选方法来选出需要的突变株。它不仅可快速产生工业上有用的新酶,而且对研究蛋白质的结构与功能的关系具有非常重要的意义。  相似文献   

12.
Traditional approaches to the directed evolution of genes of interest (GOIs) place constraints on the scale of experimentation and depth of evolutionary search reasonably achieved. Engineered genetic systems that dramatically elevate the mutation of target GOIs in vivo relieve these constraints by enabling continuous evolution, affording new strategies in the exploration of sequence space and fitness landscapes for GOIs. We describe various in vivo hypermutation systems for continuous evolution, discuss how different architectures for in vivo hypermutation facilitate evolutionary search scale and depth in their application to problems in protein evolution and engineering, and outline future opportunities for the field.  相似文献   

13.
Messenger RNA sequences often have to preserve functional secondary structure elements in addition to coding for proteins. We present a statistical analysis of retroviral mRNA which supports the hypothesis that the natural genetic code is adapted to such complementary coding. These sequences are still able to explore efficiently the space of possible proteins by point mutations. This is borne out by the observation that, in stem regions of retroviral mRNA foldings, silent mutations on one strand are preferentially accompanied by conservative mutations on the other. Distances between amino acids based on physicochemical properties are used to quantify the conservation of protein function under the constraint of maintained RNA secondary structure. We find that preservation of RNA secondary structure by compensatory mutations is evolutionary compatible with the efficient search for new variants on the protein level. Received: 4 June 1999 / Accepted: 12 October 1999  相似文献   

14.
High throughput sequencing (HTSeq) of small ribosomal subunit amplicons has the potential for a comprehensive characterization of microbial community compositions, down to rare species. However, the error-prone nature of the multi-step experimental process requires that the resulting raw sequences are subjected to quality control procedures. These procedures often involve an abundance cutoff for rare sequences or clustering of sequences, both of which limit genetic resolution. Here we propose a simple experimental protocol that retains the high genetic resolution granted by HTSeq methods while effectively removing many low abundance sequences that are likely due to PCR and sequencing errors. According to this protocol, we split samples and submit both halves to independent PCR and sequencing runs. The resulting sequence data is graphically and quantitatively characterized by the discordance between the two experimental branches, allowing for a quick identification of problematic samples. Further, we discard sequences that are not found in both branches (“AmpliconDuo filter”). We show that the majority of sequences removed in this way, mostly low abundance but also some higher abundance sequences, show features expected from random modifications of true sequences as introduced by PCR and sequencing errors. On the other hand, the filter retains many low abundance sequences observed in both branches and thus provides a more reliable census of the rare biosphere. We find that the AmpliconDuo filter increases biological resolution as it increases apparent community similarity between biologically similar communities, while it does not affect apparent community similarities between biologically dissimilar communities. The filter does not distort overall apparent community compositions. Finally, we quantitatively explain the effect of the AmpliconDuo filter by a simple mathematical model.  相似文献   

15.
Profile-based sequence search procedures are commonly employed to detect remote relationships between proteins. We provide an assessment of a Cascade PSI-BLAST protocol that rigorously employs intermediate sequences in detecting remote relationships between proteins. In this approach we detect using PSI-BLAST, which involves multiple rounds of iteration, an initial set of homologues for a protein in a 'first generation' search by querying a database. We propagate a 'second generation' search in the database, involving multiple runs of PSI-BLAST using each of the homologues identified in the previous generation as queries to recognize homologues not detected earlier. This non-directed search process can be viewed as an iteration of iterations that is continued to detect further homologues until no new hits are detectable. We present an assessment of the coverage of this 'cascaded' intermediate sequence search on diverse folds and find that searches for up to three generations detect most known homologues of a query. Our assessments show that this approach appears to perform better than the traditional use of PSI-BLAST by detecting 15% more relationships within a family and 35% more relationships within a superfamily. We show that such searches can be performed on generalized sequence databases and non-trivial relationships between proteins can be detected effectively. Such a propagation of searches maximizes the chances of detecting distant homologies by effectively scanning protein "fold space".  相似文献   

16.
GARD: a genetic algorithm for recombination detection   总被引:6,自引:0,他引:6  
MOTIVATION: Phylogenetic and evolutionary inference can be severely misled if recombination is not accounted for, hence screening for it should be an essential component of nearly every comparative study. The evolution of recombinant sequences can not be properly explained by a single phylogenetic tree, but several phylogenies may be used to correctly model the evolution of non-recombinant fragments. RESULTS: We developed a likelihood-based model selection procedure that uses a genetic algorithm to search multiple sequence alignments for evidence of recombination breakpoints and identify putative recombinant sequences. GARD is an extensible and intuitive method that can be run efficiently in parallel. Extensive simulation studies show that the method nearly always outperforms other available tools, both in terms of power and accuracy and that the use of GARD to screen sequences for recombination ensures good statistical properties for methods aimed at detecting positive selection. AVAILABILITY: Freely available http://www.datamonkey.org/GARD/  相似文献   

17.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

18.
From an evolutionary point of view, the complementarity-determining regions of antibodies are distinct from other proteins including the framework regions of antibodies. A search for identical nucleotide sequences of eighty-four 15 consecutive bp in the complementary-determining regions of human antibody heavy chains with other known sequences yielded four matches: two sequential 15-bp matches, or one 16-bp match, with the coding region of a sea-urchin testis histone H2b-2, one 15-bp match with the promotor region of a cauliflower mosaic virus inclusion body protein, and a 15-bp match with an intron between exons 1 and 2 of human factor IX. As a control, an identical search of eighty-four 15 consecutive bp in the framework regions of human antibody heavy chains yielded no matches with other sequences except those from other antibody framework regions. Since the currently available nucleotide sequence database used in the search consisted of about 1 x 10(7) bp, finding such matches in the complementarity-determining regions might not be random.  相似文献   

19.
Three-dimensional structures of membrane proteins from genomic sequencing   总被引:1,自引:0,他引:1  
Hopf TA  Colwell LJ  Sheridan R  Rost B  Sander C  Marks DS 《Cell》2012,149(7):1607-1621
We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.  相似文献   

20.
DNA sequences at immunoglobulin switch region recombination sites.   总被引:21,自引:0,他引:21       下载免费PDF全文
The immunoglobulin heavy chain switch from synthesis of IgM to IgG, IgA or IgE is mediated by a DNA recombination event. Recombination occurs within switch regions, 2-10 kb segments of DNA that lie upstream of heavy chain constant region genes. A compilation of DNA sequences at more than 150 recombination sites within heavy chain switch regions is presented. Switch recombination does not appear to occur by homologous recombination. An extensive search for a recognition motif failed to find such a sequence, implying that switch recombination is not a site-specific event. A model for switch recombination that involves illegitimate priming of one switch region on another, followed by error-prone DNA synthesis, is proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号