首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Alignments of DNA and protein sequences containing frameshift errors   总被引:1,自引:0,他引:1  
Molecular sequences, like all experimental data, are subjectto error. Many current DNA sequencing protocols have very signerror rates and often generate artefactual insertions and deletionsof bases (indels) which corrupt the translation of sequencesand compromise the detection of protein homologies. The impactof these errors on the utility of molecular sequence data isdependent on the analytic technique used to interpret the data.In the presence of frameshift errors, standard algorithms usingsix-frame translation can miss important homologies becauseonly subfragments of the correct translation are available inany given frame. We present a new algorithm which can detectand correct frameshift errors in DNA sequences during comparisonof translated sequences with protein sequences in the databases.This algorithm can recognize homologous proteins sharing 30%identity even in the presence of a 7% frameshift error rate.Our algorithm uses dynamic programming, producing a guaranteedoptimal alignment in the presence of frameshifts, and has asensitivity equivalent to Smith-Waterman. The computationalefficiency of the algorithm is O(nm) where n and m are the sizesof two sequences being compared. The algorithm does not relyon prior knowledge or heuristic rules and performs sign betterthan any previously reported method.  相似文献   

2.
随着大规模技术的进步,收录到数据库中的序列很快,其中大多是未知功能的ESTs(表达序列标签,Expressed Sequence Tags),一般通过蛋白南-EST序列联配来实验EST的功能提示。由于EST含有5%左右的误差,特别严重的是其中的移框误差,用通常的方法将EST按6个框翻译为蛋白南序列再进行联配难以处理移框误差问题。通过考虑EST序列各种可能的误差,将氨基酸序列反翻译为核苷酸序列,在核  相似文献   

3.
Spontaneous frameshift mutations are an important source of genetic variation in all species and cause a large number of genetic disorders in humans. To enhance our understanding of the molecular mechanisms of frameshift mutagenesis, 583 spontaneous Trp+ revertants of two trpA frameshift alleles in Escherichia coli were isolated and DNA sequenced. In order to measure the contribution of methyl-directed mismatch repair to frameshift production, mutational spectra were constructed for both mismatch repair-proficient and repair-defective strains. The molecular origins of practically all of the frameshifts analyzed could be explained by one of six simple models based upon misalignment of the template or nascent DNA strands with or without misincorporation of primer nucleotides during DNA replication. Most frameshifts occurred within mononucleotide runs as has been shown often in previous studies but the location of the 76 frameshift sites was usually outside of runs. Mismatch repair generally was most effective in preventing the occurrence of frameshifts within runs but there was much variation from site to site. Most frameshift sites outside of runs appear to be refractory to mismatch repair although the small number of occurrences at most of these sites make firm conclusions impossible. There was a dense pattern of reversion sites within the trpA DNA region where reversion events could occur, suggesting that, in general, most DNA sequences are capable of undergoing spontaneous mutational events during replication that can lead to small deletions and insertions. Many of these errors are likely to occur at low frequencies and be tolerated as events too costly to prevent or repair. These studies also revealed an unpredicted flexibility in the primary amino acid sequence of the trpA product, the alpha subunit of tryptophan synthase.  相似文献   

4.
H J?rnvall 《FEBS letters》1999,456(1):85-88
Motifer is a software tool able to find directly in nucleotide databases very distant homologues to an amino acid query sequence. It focuses searches on a specific amino acid pattern, scoring the matching and intervening residues as specified by the user. The program has been developed for searching databases of expressed sequence tags (ESTs), but it is also well suited to search genomic sequences. The query sequence can be a variable pattern with alternative amino acids or gaps and the sequences searched can contain introns or sequencing errors with accompanying frame shifts. Other features include options to generate a searchable output, set the maximal sequencing error frequency, limit searches to given species, or exclude already known matches. Motifer can find sequence homologues that other search algorithms would deem unrelated or would not find because of sequencing errors or a too large number of other homologues. The ability of Motifer to find relatives to a given sequence is exemplified by searches for members of the transforming growth factor-beta family and for proteins containing a WW-domain. The functions aimed at enhancing EST searches are illustrated by the 'in silico' cloning of a novel cytochrome P450 enzyme.  相似文献   

5.
Slipped-strand mispairing (SSM) may play an major role in repetitive DNA sequence evolution by generating large numbers of short frameshift mutations within simple tandem repeats. Here we examine the frequency and size spectrum of frameshifts generated within poly-CA/TG sequences inserted into bacteriophage M13 in Escherichia coli hosts. The frequency of detectable frameshifts within a 40 bp tract of poly-CA/TG is greater than one percent and increases more than linearly with length, being lower by a factor of four in a 22 bp target sequence. The frequency increases more than 13-fold in mutL and mutS host cells, suggesting that a high proportion of frameshift events are normally repaired by methyl-directed mismatch repair. Of the 87 sequenced frameshifts in this study, 96% result from deletion or insertion of only or two 2 bp repeat units. The most frequent events are 2 bp deletions, 2 bp insertions, and 4 bp deletions, the relative frequencies of these events being about 18:6:1.  相似文献   

6.
基于核rDNA的ITS序列在种子植物系统发育研究中的应用   总被引:18,自引:0,他引:18  
种子植物核rDNA是高度重复的串联序列,由于同步进化的力量.大多数物种中这些重复单位间已发生纯合或接近纯合。5.8S rDNA把核rDNA的内转录间隔区分为ITS1和ITS2两部分.在被子植物中ITS1的长度为165~298bp,ITS2的长度为177~266bp,而在裸子植物中ITS片段较长。且其长度变化主要由ITS1的长度变异所致。可对这两个片段PCR产物进行直接测序或克隆测序。由于ITS序列变异较快.能够提供较丰富的变异位点和信息位点,已成为被子植物较低分类阶元的系统发育和分类研究中的重要分子标记,为探讨多倍体复合体网状进化关系,异源多倍体的起源提供了重要的系统学信息.但它一般不适合科以上水平的系统学研究。裸子植物中ITS片段较长,重复序列间的纯合程度不同,测序比较困难.因此对探讨裸子植物系统发育和分类受到了一定的限制,但近年来有所发展。  相似文献   

7.
C. G. Cupples  M. Cabrera  C. Cruz    J. H. Miller 《Genetics》1990,125(2):275-280
We have used site-directed mutagenesis to alter bases in lacZ near the region encoding essential residues in the active site of beta-galactosidase. The altered sequences generate runs of six or seven identical base pairs which create a frameshift, resulting in a Lac- phenotype. Reversion to Lac+ in each strain can occur only by a specific frameshift at these sequences. Monotonous runs of A's (or of T's on the opposite strand) and G's (or C's) have been constructed, as has an alternating -C-G- sequence. These specific frameshift indicator strains complement a set of six previously described strains which detect each of the base substitutions. We have examined a variety of mutagens and mutators for their ability to cause reversion to Lac+. Surprisingly, frameshifts are well stimulated at many of these runs by ethyl methanesulfonate, N-methyl-N'-nitro-N-nitrosoguanidine and 2-amino-purine, mutagens not widely known to induce frameshifts. A comparison of ethyl methanesulfonate, N-methyl-N'-nitro-N-nitrosoguanidine and 2-aminopurine frameshift specificity with that found with a mutH strain suggests that these mutagens partially or fully saturate or inactivate the methylation-directed mismatch repair system and allow replication errors leading to frameshifts to escape repair. This results in a form of indirect mutagenesis, which can be detected at certain sites.  相似文献   

8.
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next‐generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century‐old type specimens of Lepidoptera by attempting to recover 164‐bp and 94‐bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164‐bp sequence), medium (94‐bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR‐based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.  相似文献   

9.
Molecular markers are used to provide the link between genotype and phenotype, for the production of molecular genetic maps and to assess genetic diversity within and between related species. Single nucleotide polymorphisms (SNPs) are the most abundant molecular genetic marker. SNPs can be identified in silico , but care must be taken to ensure that the identified SNPs reflect true genetic variation and are not a result of errors associated with DNA sequencing. The SNP detection method autoSNP has been developed to identify SNPs from sequence data for any species. Confidence in the predicted SNPs is based on sequence redundancy, and haplotype co-segregation scores are calculated for a further independent measure of confidence. We have extended the autoSNP method to produce autoSNPdb, which integrates SNP and gene annotation information with a graphical viewer. We have applied this software to public barley expressed sequences, and the resulting database is available over the Internet. SNPs can be viewed and searched by sequence, functional annotation or predicted synteny with a reference genome, in this case rice. The correlation between SNPs and barley cultivar, expressed tissue type and development stage has been collated for ease of exploration. An average of one SNP per 240 bp was identified, with SNPs more prevalent in the 5' regions and simple sequence repeat (SSR) flanking sequences. Overall, autoSNPdb can provide a wealth of genetic polymorphism information for any species for which sequence data are available.  相似文献   

10.
11.
Comparative and phylogenetic analysis of developmental sequences   总被引:3,自引:0,他引:3  
Event pairing has been proposed for the optimization of developmental sequences (event sequences) on a given phylogenetic hypothesis (cladogram) to determine instances of sequence heterochrony. Here, we show that event pairing is faulty, leading to the optimization of impossible hypothetical ancestors, the underestimation of the lengths of the developmental sequences on the tree, and the proposition of synapomorphies that are not supported by the data. When used for phylogenetic analysis, event pairing can even produce cladograms that are inconsistent with the data. These errors are caused by the fact that event pairing treats dependent features as if they were independent. We present a new method for comparative and phylogenetic analysis of developmental sequences that does not exhibit these errors. Our method applies Search-based character optimization and treats the entire developmental sequence as a single character that is then analyzed by using an edit cost function, which specifies the transformation cost between pairs of observed and unobserved character states, and dynamic programming. In other words, the developmental sequence is directly optimized on the tree. We used event pairing as an edit cost function, but others are possible.  相似文献   

12.
A computer-aided homology search of databases found that the nucleotide sequences flanking ATLN44, a non-LTR retrotransposon (LINE) from Arabidopsis thaliana, are repeated in the A. thaliana genome. These sequences are homologous to flanking sequences of 664 bp with terminal inverted repeat sequences of about 70 bp. The 664-bp sequence and most of the 14 homologues identified were flanked by direct repeat sequences of 9 bp. These findings indicate that the repeated sequence, named Tnat1, is a transposable element that duplicates a 9-bp sequence at the target site on transposition and that ATLN44 is inserted in one Tnat1 member. Interestingly, all of the Tnat1 members had tandem repeats comprised of several units of a 60-bp sequence, the number of repeats differing among Tnat1 members. Of the Tnat1 members identified, one was inserted into another sequence repeated in the A. thaliana genome: that sequence is about 770 bp long and has terminal inverted repeat sequences of about 110 bp. The sequence is flanked by direct repeats of a 9-bp sequence, indicating that it is another transposable element, named Tnat2, from A. thaliana. Moreover, Tnat2 members had a tandem repeat about 240 bp long. Tnat1 and Tnat2 with tandem repeats in their internal regions show no homology to each other or to any of the elements identified previously; therefore they appear to be novel transposable elements.  相似文献   

13.
Oudot-Le Secq MP  Green BR 《Gene》2011,476(1-2):20-26
The mitochondrial genome of the raphid pennate diatom Phaeodactylum tricornutum has several novel features compared with the mitochondrial genomes of the centric diatom Thalassiosira pseudonana and the araphid pennate diatom Synedra acus. It is almost double the size (77,356 bp) due to a 35,454 bp sequence block consisting of an elaborate combination of direct repeats, making it the largest stramenopile (heterokont) mitochondrial genome known. In addition, the cox1 gene has a +1 translational frameshift involving Pro codons CCC and CCT, the first translational frameshift to be detected in an algal mitochondrial genome. The nad9 and rps14 genes are fused by the insertion of an in-frame sequence and cotranscribed. The nad11 gene is split into two parts corresponding to the FeS and molybdate-binding domains, but both parts are still on the mitochondrial genome, in contrast to the brown algae where the second domain appears to have been transferred to the nucleus. In contrast to P. tricornutum, the repeat region of T. pseudonana consists of a much smaller 4790 bp string of almost identical double-hairpin elements, evidence of slipped-strand mispairing and active gene conversion. The diatom mitochondrial genomes have undergone considerable gene rearrangement since the three lineages of diatoms diverged, but all three have kept their repeat regions segregated from their relatively compact coding regions.  相似文献   

14.
NMR methods were used to investigate a series of mutants of the pseudoknot within the gene 32 messenger RNA of bacteriophage T2, for the purpose of investigating the range of sequences, stem and loop lengths that can form a similar pseudoknot structure. This information is of particular relevance since the T2 pseudoknot has been considered a representative of a large family of RNA pseudoknots related by a common structural motif, previously referred to as 'common pseudoknot motif 1' or CPK1. In the work presented here, a mutated sequence with the potential to form a pseudoknot with a 6 bp stem2 was shown to adopt a pseudoknot structure similar to that of the wild-type sequence. This result is significant in that it demonstrates that pseudoknots with 6 bp in stem2 and a single nucleotide in loop1 are indeed feasible. Mutated sequences with the potential to form pseudoknots with either 5 or 8 bp in stem2 yielded NMR spectra that could not confirm the formation of a pseudoknot structure. Replacing the adenosine nucleotide in loop1 of the wild-type pseudoknot with any one of G, C or U did not significantly alter the pseudoknot structure. Taken together, the results of this study provide support for the existence of a family of similarly structured pseudoknots with two coaxially stacked stems, either 6 or 7 bp in stem2, and a single nucleotide in loop1. This family includes many of the pseudoknots predicted to occur downstream of the frameshift or readthrough sites in a significant number of viral RNAs.  相似文献   

15.
A repetitive element (IS986), previously isolated from Mycobacterium tuberculosis and shown to detect multiple restriction fragment-length polymorphisms (RFLPs), has been sequenced. It consists of a potential insertion sequence of 1358bp, with 30-bp inverted repeat ends. IS986 has four potentially significant open reading frames (ORFs): ORFa1, ORFa2 and ORFb on one strand and ORFc on the complementary strand. The sequences of the potential translated products identify IS986 as a member of the IS3 family, with an apparent frameshift between ORFa1 and ORFa2. IS986 has potential as a highly specific probe for detection and typing of M. tuberculosis, as well as for transposon mutagenesis of mycobacteria. The sequence of IS986 is virtually identical to that of another recently described element, IS6110 (Thierry et al., 1990).  相似文献   

16.
Sumiyama K  Kim CB  Ruddle FH 《Genomics》2001,71(2):260-262
The discovery of cis-element control motifs in noncoding DNA poses a difficult problem in genome analysis. Functional analysis by means of reporter constructs expressed in transgenic organisms is the most reliable method, but is by itself time-consuming and expensive. Searching noncoding DNA for known control motifs by sequence analysis is problematic, since protein binding motifs are short, in the range of 8-10 bp, and occur frequently by chance. Heretofore, the most reliable sequence analysis method has been the comparison of homologous sequence domains in related but moderately evolutionarily divergent species such as, for example, mouse and human. In such pairwise combinations, control regions are conserved because they serve a vital function and can be identified by their similar sequences. Single pairwise comparisons, however, allow the discovery of conserved sequence strings only at low resolution and without specific identity. We have investigated the possibility of using multiple sequence comparisons to correct these shortcomings. We applied this method to the Hoxc8 early enhancer region that has been previously analyzed in depth by functional methods and through its application successfully identified known protein binding cis-element motifs. Candidate protein binding sites could also be identified. This method, based on evolutionarily related sequence comparisons, should be quite useful as a prescreening step prior to functional analysis with corresponding savings in time and resources.  相似文献   

17.
MOTIVATION: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html  相似文献   

18.
DNA sequence is an important determinant of the positioning, stability, and activity of nucleosomes, yet the molecular basis of these effects remains elusive. A "consensus DNA sequence" for nucleosome positioning has not been reported and, while certain DNA sequence preferences or motifs for nucleosome positioning have been discovered, how they function is not known. Here, we report that an unexpected observation concerning the reassembly of nucleosomes during salt gradient dialysis has allowed a breakthrough in our efforts to identify the nucleosomal locations of the DNA sequence motifs that dominate histone-DNA interactions and nucleosome positioning. We conclude that a previous selection experiment for high-affinity, nucleosome-forming DNA sequences exerted selective pressure chiefly on the central stretch of the nucleosomal DNA. This observation implies that algorithms for aligning the selected DNA sequences should seek to optimize the alignment over much less than the full 147 bp of nucleosomal DNA. A new alignment calculation implemented these ideas and successfully aligned 19 of the 41 sequences in a non-redundant database of selected high-affinity, nucleosome-positioning sequences. The resulting alignment reveals strong conservation of several stretches within a central 71 bp of the nucleosomal DNA. The alignment further reveals an inherent palindromic symmetry in the selected DNAs; it makes testable predictions of nucleosome positioning on the aligned sequences and for the creation of new positioning sequences, both of which are upheld experimentally; and it suggests new signals that may be important in translational nucleosome positioning.  相似文献   

19.
The mutagenic potency of the simple reversible intercalators isopropyl-OPC (iPr-OPC) and 9-aminoacridine (9-AA) is assessed in E. coli using reversion assays based on plasmids derived from pBR322 carrying various frameshift mutations within the tetracycline resistance gene in repetitive sequences: +/- 2 frameshift mutations within alternating GC sequences; +/- 1 frameshift mutation at runs of guanines. The results obtained show that iPr-OPC and 9-AA have a sequence specificity for mutagenesis: they revert +1 and -1 frameshift mutations within runs of monotonous G:C base pairs. The precise determination of the size of a small restriction fragment which contains the mutation allowed us to demonstrate that reversion occurred by -1 deletions for the +1 frameshift mutations and by +1 additions for the -1 frameshift mutations. The possible relations of this specific reversion with the base sequence specificity of the mutagenesis are briefly discussed.  相似文献   

20.
The structure of transposable yeast mating type loci   总被引:133,自引:0,他引:133  
K A Nasmyth  K Tatchell 《Cell》1980,19(3):753-764
A recombinant plasmid containing a MAT alpha mating type locus of Saccharomyces cerevisiae has been isolated by its ability to complement a sterile mat alpha mutation. The plasmid hybridizes to restriction fragments containing both active mating type loci (MATa and MAT alpha) and both silent mating type loci (HMRa and HML alpha). All loci therefore have common sequences. Recombinant lambda clones of the locihave been isolated by plaque hybridization and their structures have been compared by a heteroduplex analysis. At its center, each locus contains one of two apparently nonhomologous sequences. Loci concerned with the alpha phenotype (MAT alpha and HML alpha) contain and 850 bp alpha-specific sequence, whereas loci concerned with the a phenotype (MATa and HMRa) contain a 700 bp a-specific sequence. The a- or alpha-specific sequences are surrounded by DNA sequences that are common to all loci. These homologous sequences extend for 230 bp on the left and 700 bp on the right. They appear to be unrelated to each other. Surprisingly, HML alpha and HMRa differ in their extent of homology to MATa and MAT alpha outside the above regions. HMRa lacks an extensive (700 bp) DNA sequence to the right of the large right-hand homologous region, and possibly also a small (90 bp) sequence to the left of the small left-hand homologous region, both of which are present at HML alpha, MATa and MAT alpha. Hybridization studies have shown that the 700 bp sequence is present at HMLa but absent at HMR alpha alleles. It is therefore characteristic of HML, irrespective of whether it contains a- or alpha-specific sequences. The results imply that mating type interconversion is effected by transposition of DNA sequences from HML or HMR to MAT, as predicted by the controlling element model of Oshima and Takano (1971) and the Cassette model of Hicks, Strathern and Herskowitz (1977).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号