首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A frameshift error detection algorithm for DNA sequencing projects.   总被引:3,自引:1,他引:2       下载免费PDF全文
During the determination of DNA sequences, frameshift errors are not the most frequent but they are the most bothersome as they corrupt the amino acid sequence over several residues. Detection of such errors by sequence alignment is only possible when related sequences are found in the databases. To avoid this limitation, we have developed a new tool based on the distribution of non-overlapping 3-tuples or 6-tuples in the three frames of an ORF. The method relies upon the result of a correspondence analysis. It has been extensively tested on Bacillus subtilis and Saccharomyces cerevisiae sequences and has also been examined with human sequences. The results indicate that it can detect frameshift errors affecting as few as 20 bp with a low rate of false positives (no more than 1.0/1000 bp scanned). The proposed algorithm can be used to scan a large collection of data, but it is mainly intended for laboratory practice as a tool for checking the quality of the sequences produced during a sequencing project.  相似文献   

2.
The fidelity of DNA synthesis by an exonuclease-proficient DNA polymerase results from the selectivity of the polymerization reaction and from exonucleolytic proofreading. We have examined the contribution of these two steps to the fidelity of DNA synthesis catalyzed by the large Klenow fragment of Escherichia coli DNA polymerase I, using enzymes engineered by site-directed mutagenesis to inactivate the proofreading exonuclease. Measurements with two mutant Klenow polymerases lacking exonuclease activity but retaining normal polymerase activity and protein structure demonstrate that the base substitution fidelity of polymerization averages one error for each 10,000 to 40,000 bases polymerized, and can vary more than 30-fold depending on the mispair and its position. Steady-state enzyme kinetic measurements of selectivity at the initial insertion step by the exonuclease-deficient polymerase demonstrate differences in both the Km and the Vmax for incorrect versus correct nucleotides. Exonucleolytic proofreading by the wild-type enzyme improves the average base substitution fidelity by 4- to 7-fold, reflecting efficient proofreading of some mispairs and less efficient proofreading of others. The wild-type polymerase is highly accurate for -1 base frameshift errors, with an error rate of less than or equal to 10(-6). The exonuclease-deficient polymerase is less accurate, suggesting that proofreading also enhances frameshift fidelity. Even without a proofreading exonuclease, Klenow polymerase has high frameshift fidelity relative to several other DNA polymerases, including eucaryotic DNA polymerase-alpha, an exonuclease-deficient, 4-subunit complex whose catalytic subunit is almost three times larger. The Klenow polymerase has a large (46 kDa) domain containing the polymerase active site and a smaller (22 kDa) domain containing the active site for the 3'----5' exonuclease. Upon removal of the small domain, the large polymerase domain has altered base substitution error specificity when compared to the two-domain but exonuclease-deficient enzyme. It is also less accurate for -1 base errors at reiterated template nucleotides and for a 276-nucleotide deletion error. Thus, removal of a protein domain of a DNA polymerase can affect its fidelity.  相似文献   

3.
Ribosomes can be programmed to shift from one reading frame to another during translation. Hepatitis C virus (HCV) uses such a mechanism to produce F protein from the -2/+1 reading frame. We now report that the HCV frameshift signal can mediate the synthesis of the core protein of the zero frame, the F protein of the -2/+1 frame, and a 1.5-kDa protein of the -1/+2 frame. This triple decoding function does not require sequences flanking the frameshift signal and is apparently independent of membranes and the synthesis of the HCV polyprotein. Two consensus -1 frameshift sequences in the HCV type 1 frameshift signal facilitate ribosomal frameshifts into both overlapping reading frames. A sequence which is located immediately downstream of the frameshift signal and has the potential to form a double stem-loop structure can significantly enhance translational frameshifting in the presence of the peptidyl-transferase inhibitor puromycin. Based on these results, a model is proposed to explain the triple decoding activities of the HCV ribosomal frameshift signal.  相似文献   

4.
The ribosome is a molecular machine that converts genetic information in the form of RNA, into protein. Recent structural studies reveal a complex set of interactions between the ribosome and its ligands, mRNA and tRNA, that indicate ways in which the ribosome could avoid costly translational errors. Ribosomes must decode each successive codon accurately, and structural data provide a clear indication of how ribosomes limit recruitment of the wrong tRNA (sense errors). In a triplet-based genetic code there are three potential forward reading frames, only one of which encodes the correct protein. Errors in which the ribosome reads a codon out of the normal reading frame (frameshift errors) occur less frequently than sense errors, although it is not clear from structural data how these errors are avoided. Some mRNA sequences, termed programmed-frameshift sites, cause the ribosome to change reading frame. Based on recent work on these sites, this article proposes that the ribosome uses the structure of the codon-anticodon complex formed by the peptidyl-tRNA, especially its wobble interaction, to constrain the incoming aminoacyl-tRNA to the correct reading frame.  相似文献   

5.
6.
The fidelity of DNA synthesis catalyzed by the 180-kDa catalytic subunit (p180) of DNA polymerase alpha from Saccharomyces cerevisiae has been determined. Despite the presence of a 3'----5' exonuclease activity (Brooke et al., 1991, J. Biol. Chem., 266, 3005-3015), its accuracy is similar to several exonuclease-deficient DNA polymerases and much lower than other DNA polymerases that have associated exonucleolytic proofreading activity. Average error rates are 1/9900 and 1/12,000, respectively, for single base-substitution and minus-one nucleotide frameshift errors; the polymerase generates deletions as well. Similar error rates are observed with reactions containing the 180-kDa subunit plus an 86-kDa subunit (p86), or with these two polypeptides plus two additional subunits (p58 and p49) comprising the DNA primase activity required for DNA replication. Finally, addition of yeast replication factor-A (RF-A), a protein preparation that stimulates DNA synthesis and has single-stranded DNA-binding activity, yields a polymerization reaction with 7 polypeptides required for replication, yet fidelity remains low relative to error rates for semiconservative replication. The data suggest that neither exonucleolytic proofreading activity, the beta subunit, the DNA primase subunits nor RF-A contributes substantially to base substitution or frameshift error discrimination by the DNA polymerase alpha catalytic subunit.  相似文献   

7.
Amino acid similarity often needs to be considered in DNA sequence comparison to elucidate gene functions. We propose a Smith-Waterman-like algorithm which considers amino acid similarity and insertions/deletions in sequences at the DNA level and at the protein level in a hybrid manner. The algorithm is applied to cDNA sequences of Oryza sativa and those of Arabidopsis thaliana. The results are compared with the results of application of NCBI's tblastx program (which compares the sequences in the BLAST manner after translation). It is shown that the present algorithm is very helpful in discovering nucleotide insertions/deletions originating from experimental errors as well as amino acid insertions/deletions due to evolutionary reasons.  相似文献   

8.
Frameshift mutagenesis by eucaryotic DNA polymerases in vitro   总被引:23,自引:0,他引:23  
The frequency and specificity of frameshift errors produced during a single round of in vitro DNA synthesis by DNA polymerases-alpha, -beta, and -gamma (pol-alpha, -beta, and -gamma, respectively) have been determined. DNA polymerase-beta is the least accurate enzyme, producing frameshift errors at an average frequency of one error for each 1,000-3,000 nucleotides polymerized, a frequency similar to its average base substitution accuracy. DNA polymerase-alpha is approximately 10-fold more accurate, producing frameshifts at an average frequency of one error for every 10,000-30,000 nucleotides polymerized, a frequency which is about 2- to 6-fold lower than the average pol-alpha base substitution accuracy. DNA polymerase-gamma is highly accurate, producing on the average less than one frameshift error for every 200,000-400,000 nucleotides polymerized. This represents a more than 10-fold higher fidelity than for base substitutions. Among the collection of sequenced frameshifts produced by DNA polymerases-alpha and beta, both common features and distinct specificities are apparent. These specificities suggest a major role for eucaryotic DNA polymerases in modulating frameshift fidelity. Possible mechanisms for production of frameshifts are discussed in relation to the observed biases. One of these models has been experimentally supported using site-directed mutagenesis to change the primary DNA sequence of the template. Alteration of a pol-beta frameshift hotspot sequence TTTT to CTCT reduced the frequency of pol-beta-dependent minus-one-base errors at this site by more than 30-fold, suggesting that more than 97% of the errors at the TTTT run involve a slippage mechanism.  相似文献   

9.
The human immunodeficiency virus of type 1 (HIV-1) uses a programmed -1 ribosomal frameshift to produce the precursor of its enzymes, and changes in frameshift efficiency reduce replicative fitness of the virus. We used a fluorescent two-reporter system to screen for peptides that reduce HIV-1 frameshift in bacteria, knowing that the frameshift can be reproduced in Escherichia coli. Expression of one reporter, the green fluorescent protein (GFP), requires the HIV-1 frameshift, whereas the second reporter, the red fluorescent protein (RFP), is used to assess normal translation. A peptide library biased for RNA binding was inserted into the sequence of the protein thioredoxin and expressed in reporter-containing bacteria, which were then screened by fluorescence-activated cell sorting (FACS). We identified peptide sequences that reduce frameshift efficiency by over 50% without altering normal translation. The identified sequences are also active against different frameshift stimulatory signals, suggesting that they bind a target important for frameshifting in general, probably the ribosome. Successful transfer of active sequences to a different scaffold in a eukaryotic test system demonstrates that the anti-frameshift activity of the peptides is neither due to scaffold-dependent conformation nor effects of the scaffold protein itself on frameshifting. The method we describe identifies peptides that will provide useful tools to further study the mechanism of frameshift and may permit the development of lead compounds of therapeutic interest.  相似文献   

10.
A mutational analysis of the eukaryotic elongation factor EF-1 alpha indicates that this protein functions to limit the frequency of errors during genetic code translation. We found that both amino acid misincorporation and reading frame errors are controlled by EF-1 alpha. In order to examine the function of this protein, the TEF2 gene, which encodes EF-1 alpha in Saccharomyces cerevisiae, was mutagenized in vitro with hydroxylamine. Sixteen independent TEF2 alleles were isolated by their ability to suppress frameshift mutations. DNA sequence analysis identified eight different sites in the EF-1 alpha protein that elevate the frequency of mistranslation when mutated. These sites are located in two different regions of the protein. Amino acid substitutions located in or near the GTP-binding and hydrolysis domain of the protein cause suppression of frameshift and nonsense mutations. These mutations may effect mistranslation by altering the binding or hydrolysis of GTP. Amino acid substitutions located adjacent to a putative aminoacyl-tRNA binding region also suppress frameshift and nonsense mutations. These mutations may alter the binding of aminoacyl-tRNA by EF-1 alpha. The identification of frameshift and nonsense suppressor mutations in EF-1 alpha indicates a role for this protein in limiting amino acid misincorporation and reading frame errors. We suggest that these types of errors are controlled by a common mechanism or closely related mechanisms.  相似文献   

11.
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.  相似文献   

12.
The main features of translation are similar in all organisms on this planet and one important feature of it is the way the ribosome maintain the reading frame. We have earlier characterized several bacterial mutants defective in tRNA maturation and found that some of them correct a +1 frameshift mutation; i.e. such mutants possess an error in reading frame maintenance. Based on the analysis of the frameshifting phenotype of such mutants we proposed a pivotal role of the ribosomal grip of the peptidyl-tRNA to maintain the correct reading frame. To test the model in an unbiased way we first isolated many (467) independent mutants able to correct a +1 frameshift mutation and thereafter tested whether or not their frameshifting phenotypes were consistent with the model. These 467+1 frameshift suppressor mutants had alterations in 16 different loci of which 15 induced a defective tRNA by hypo- or hypermodifications or altering its primary sequence. All these alterations of tRNAs induce a frameshift error in the P-site to correct a +1 frameshift mutation consistent with the proposed model. Modifications next to and 3′ of the anticodon (position 37), like 1-methylguanosine, are important for proper reading frame maintenance due to their interactions with components of the ribosomal P-site. Interestingly, two mutants had a defect in a locus (rpsI), which encodes ribosomal protein S9. The C-terminal of this protein contacts position 32–34 of the peptidyl-tRNA and is thus part of the P-site environment. The two rpsI mutants had a C-terminal truncated ribosomal protein S9 that destroys its interaction with the peptidyl-tRNA resulting in +1 shift in the reading frame. The isolation and characterization of the S9 mutants gave strong support of our model that the ribosomal grip of the peptidyl-tRNA is pivotal for the reading frame maintenance.  相似文献   

13.
We estimate DNA sequence error rates in Genbank records containing protein-coding and non-coding DNA sequences by comparing sequences of the inbred mouse strain C57BL/6J, sequenced as part of the mouse genome project and independently by other laboratories. C57BL/6J was produced by more than 100 generations of brother-sister mating, and can be assumed to be virtually free of residual polymorphism and mutational variation, so differences between independent sequences can be attributed to error. The estimated single nucleotide error rate for coding DNA is 0.10% (SE 0.012%), which is substantially lower than previous estimates for error rates in Genbank accessions. The estimated single nucleotide error rate for intronic DNA sequences (0.22%; SE 0.051%) is significantly higher than the rate for coding DNA. Since error rates for the mouse genome sequence are very low, the vast majority of the errors we detected are likely to be in individual Genbank accessions. The frequency of insertion-deletion (indel) errors in non-coding DNA approaches that of single nucleotide errors in non-coding DNA, whereas indel errors are uncommon in coding sequences.  相似文献   

14.
Two-dimensional graphic analysis of DNA sequence homologies.   总被引:9,自引:3,他引:6       下载免费PDF全文
We describe a computer program designed to facilitate the pattern matching analysis of homologies between DNA sequences. It takes advantage of a two-dimensional plot in order to simplify the evaluation of significant structures inherited in the sequences. The program can be divided into three parts, i) algorithm for search of homologies, ii) two-dimensional graphic display of the result, iii) further graphic treatment to enhance significant structures. The power of the graphic display is presented by the following application of the program. We conducted a search for direct repeats in the mouse immunoglobulin kappa-chain genes. Both the five J DNA sequences and other shorter repeats were found. We also found a longer stretch of homology that could indicate the presence of duplicated DNA in the J4, J5 region.  相似文献   

15.
Database search tools identify peptides by matching tandem mass spectra against a protein database. We study an alternative approach when all plausible de novo interpretations of a spectrum (spectral dictionary) are generated and then quickly matched against the database. We present a new MS-Dictionary algorithm for efficiently generating spectral dictionaries and demonstrate that MS-Dictionary can identify spectra that are missed in the database search. We argue that MS-Dictionary enables proteogenomics searches in six-frame translation of genomic sequences that may be prohibitively time-consuming for existing database search approaches. We show that such searches allow one to correct sequencing errors and find programmed frameshifts.  相似文献   

16.
脱氧核糖核酸(Deoxyribonucleic Acid, DNA)是一种天然的信息存储介质,具有存储密度高、存储时间长、损耗率低等特点。在传统存储方式不能满足信息增长的需求时,DNA数据存储技术逐渐成为研究热点。DNA编码是用尽可能少的碱基序列无错的存储数据信息,包括压缩(尽可能少的占用空间)、纠错(无错存储)和转换(数字信息转为碱基序列)3部分。DNA编码是DNA存储中的关键技术,它的结果直接影响存储性能的优劣和数据读写的完整。本文首先介绍DNA存储的发展历史,然后介绍DNA存储的框架,其中重点介绍DNA编码技术,最后对DNA存储中的编解码技术的未来发展方向进行讨论。  相似文献   

17.
For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.  相似文献   

18.
19.
20.
Low-cost, high-throughput gene synthesis and precise control of protein expression are of critical importance to synthetic biology and biotechnology. Here we describe the development of an on-chip gene synthesis technology, which integrates on a single microchip the synthesis of DNA oligonucleotides using inkjet printing, isothermal oligonucleotide amplification and parallel gene assembly. Use of a mismatch-specific endonuclease for error correction results in an error rate of ~0.19 errors per kb. We applied this approach to synthesize pools of thousands of codon-usage variants of lacZα and 74 challenging Drosophila protein antigens, which were then screened for expression in Escherichia coli. In one round of synthesis and screening, we obtained DNA sequences that were expressed at a wide range of levels, from zero to almost 60% of the total cell protein mass. This technology may facilitate systematic investigation of the molecular mechanisms of protein translation and the design, construction and evolution of macromolecular machines, metabolic networks and synthetic cells.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号