首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
P Bucher  G Yagil 《DNA sequence》1991,1(3):157-172
A program to analyse the length and frequency distribution of specific base tracts in genomic sequences is described. The frequency of oligopurine.oligopyrimidine tracts (R.Y. tracts) in a data base of 163 transcribed genes is analysed and compared. The complete genomes of SV40 virus, N. tobacum chloroplast, yeast 2 micron plasmid, bacteriophage lambda, plasmid pBR322 and the E. coli lac operon are also analyzed. A highly significant overrepresentation of oligopurine and oligopyrimidine tracts is observed in all eukaryotic genes examined, as well as in the chloroplast genome. The overrepresentation is evident in all gene subregions of the chloroplast, in the following order: intergenic regions, 3' downstream and 5' upstream (promoter), 5' and 3' untranslated, introns and coding regions. In genes coding for basic proteins, oligopurine rather than oligopyrimidine tracts are found on the coding stand. In prokaryotic genes only the longest R.Y. tracts (greater than or equal to 12) are found in excess, and are concentrated near regulatory regions. While a structural role for R.Y. tracts is most likely in intergenic regions, a functional role, as initiation sites for strand separation, is proposed for regulatory gene regions.  相似文献   

2.
The effects of base sequence, specifically different pyrimidines flanking a bulky DNA adduct, on translesional synthesis in vitro catalyzed by the Klenow fragment of Escherichia coli Pol I (exo(-)) was investigated. The bulky lesion was derived from the binding of a benzo[a]pyrene diol epoxide isomer [(+)-anti-BPDE] to N(2)-guanine (G*). Four different 43-base long oligonucleotide templates were constructed with G* at a site 19 bases from the 5'-end. All bases were identical, except for the pyrimidines, X or Y, flanking G* (sequence context 5'-.XGY., with X, Y = C and/or T). In all cases, the adduct G* slows primer extension beyond G* more than it slows the insertion of a dNTP opposite G* (A and G were predominantly inserted opposite G, with A > G). Depending on X or Y, full lesion bypass differed by factors of approximately 1.5-5 ( approximately 0.6-3.0% bypass efficiencies). A downstream T flanking G on the 5'-side instead of C favors full lesion bypass, while an upstream C flanking G* is more favorable than a T. Various deletion products resulting from misaligned template-primer intermediates are particularly dominant ( approximately 5.0-6.0% efficiencies) with an upstream flanking C, while a 3'-flanking T lowers the levels of deletion products ( approximately 0.5-2.5% efficiencies). The kinetics of (1) single dNTP insertion opposite G* and (2) extension of the primer beyond G* by a single dNTP, or in the presence of all four dNTPs, with different 3'-terminal primer bases (Z) opposite G* were investigated. Unusually efficient primer extension efficiencies beyond the adduct (approaching approximately 90%) was found with Z = T in the case of sequences with 3'-flanking upstream C rather than T. These effects are traced to misaligned slipped frameshift intermediates arising from the pairing of pairs of downstream template base sequences (up to 4-6 bases from G*) with the 3'-terminal primer base and its 5'-flanking base. The latter depend on the base Y and on the base preferentially inserted opposite the adduct. Thus, downstream template sequences as well as the bases flanking G* influence DNA translesion synthesis.  相似文献   

3.
Yeramian E 《Gene》2000,255(2):151-168
A gene identification procedure is formulated, based on large-scale structural analyses of genomic sequences. The structural property is the physical - thermal - stability of the DNA double-helix, as described by the classical helix-coil model. The analyses are detailed for the Plasmodium falciparum genome, which represents one of the most difficult cases for the gene identification problem (notably because of the extreme AT-richness of the genome). In this genome, the coding domains (either uninterrupted genes or exons in split genes) are accurately identified as regions of high thermal stability. The conclusion is based on the study of the available cloned genes, of which 17 examples are described in detail. These examples demonstrate that the physical criterion is valid for the detection of coding regions whose lengths extend from a few base pairs up to several thousand base pairs. Accordingly, the structural analyses can provide a powerful and convenient tool for the identification of complex genes in the P. falciparum genome. The limits of such a scheme are discussed. The gene identification procedure is applied to the completely sequenced chromosomes (2 and 3), and the results are compared with the database annotations. The structural analyses suggest more or less extensive revision to the annotations, and also allow new putative genes to be identified in the chromosome sequences. Several examples of such new genes are described in detail.  相似文献   

4.
Coman D  Russu IM 《Biochemistry》2002,41(13):4407-4414
Recognition of specific sites in double-helical DNA by triplex-forming oligonucleotides has been limited until recently to sites containing homopurine-homopyrimidine sequences. G*TA and T*CG triads, in which TA and CG base pairs are specifically recognized by guanine or by thymine, have now extended this recognition code to DNA target sites of mixed base sequences. In the present work, we have obtained a characterization of the stabilities of G*TA and T*CG triads, and of the effects of these triads upon canonical triads, in triple-helical DNA. The three DNA triplexes investigated are formed by the folding of the 31-mers d(GAAXAGGT(5)CCTYTTCT(5)CTTZTCC) with X = G, T, or C, Y = C, A, or G, and Z = C, G, or T. We have measured the exchange rates of imino protons in each triad of the three triplexes using nuclear magnetic resonance spectroscopy. The exchange rates are used to map the local free energy of structural stabilization in each triplex. The results indicate that the stability of Watson-Crick base pairs in the G*TA and T*CG triads is comparable to that of Watson-Crick base pairs in canonical triads. The presence of G*TA and T*CG triads, however, destabilizes neighboring canonical triads, two or three positions removed from the G*TA/T*CG site. Moreover, the long-range destabilizing effects induced by the T*CG triad are larger than those induced by the G*TA triad. These findings reveal the molecular basis for the lower overall stability of G*TA- and T*CG-containing triplexes.  相似文献   

5.
The effects of bases flanking single bulky lesions derived from the binding of a benzo[a]pyrene 7,8-diol 9,10-epoxide derivative ((+)-7R,8S,9S,10R stereoisomer) to N(2)-guanine (G*) on translesion bypass catalyzed by the Y-family polymerase pol kappa (hDinB1) were examined in vitro. The lesions were positioned near the middle of six different 43-mer 5'-...XG*Y... sequences (X, Y = C, T, or G, with all other bases remaining fixed). The complementary dCTP is preferentially inserted opposite G* in all of the sequences; however, the proportions of other dNTPs inserted varies as a function of X and Y. The dCTP insertion efficiencies, f(ins) = (V(max)/K(m))(ins), are smaller in the XG*Y than in XGY sequences by factors of approximately 50-90 (GG*T and GG*C) or 5000-25000 (TG*G and CG*G). Remarkably, in XG*Y sequences, f(ins) varies by as much as 3 orders of magnitude, being smallest with G flanking the lesions on the 3'-side and highest with G flanking the adducts on the 5'-side. One-step primer extension efficiencies just beyond the lesions (f(ext)) are generally smaller than f(ins) and also depend on base sequence. However, reasonably efficient translesion bypass of the (+)-trans-[BP]-N(2)-dG adducts is observed in all sequences in running-start experiments with full, or nearly full, primer extension being observed under conditions of [dNTP] > K(m). The key features here are the relatively robust values of the kinetic parameters V(max) that are either diminished to a moderate extent or even enhanced in the presence of the (+)-trans-[BP]-N(2)-dG adducts. In contrast to the small effects of the lesions on V(max), the apparent K(m) values are orders of magnitude greater in XG*Y than in the unmodified XGY sequences. Thus the bypass of (+)-trans-[BP]-N(2)-dG adducts under conditions when [dNTP] < K(m) is quite inefficient. These considerations may be of importance in vivo where [dNTP] 相似文献   

6.
Methylated DNA-binding protein (MDBP) from mammalian cells binds specifically to six pBR322 and M13mp8 DNA sequences but only when they are methylated at their CpG dinucleotide pairs. We cloned three high-affinity MDBP recognition sites from the human genome on the basis of their binding to MDBP. These showed much homology to the previously characterized prokaryotic sites. However, the human sites exhibited methylation-independent binding apparently because of the replacement of m5C residues with T residues. We also identified three other MDBP sites in the herpes simplex virus type 1 genome, two of which require in vitro CpG methylation for binding and are in the upstream regions of viral genes. A comparison of MDBP sites leads to the following partially symmetrical consensus sequence for MDBP recognition sites: 5'-R T m5Y R Y Y A m5Y R G m5Y R A Y-3'; m5Y (m5C or T), R (A or G), Y (C or T). This consensus sequence displays an unusually high degree of degeneracy. Also, interesting deviations from this consensus sequence, including a one base-pair deletion in the middle, are sometimes observed in high-affinity MDBP sites.  相似文献   

7.
8.
Recently, we proposed a new model of DNA sequence evolution (Arquès and Michel. 1990b.Bull. math. Biol. 52, 741–772) according to which actual genes on the purine/pyrimidine (R/Y) alphabet (R=purine=adenine or guanine, Y=pyrimidine=cytosine or thymine) are the result of two successive evolutionary genetic processes: (i) a mixing (independent) process of non-random oligonucleotides (words of base length less than 10: YRY(N)6, YRYRYR and YRYYRY are so far identified; N=R or Y) leading to primitive genes (words of several hundreds of base length) and followed by (ii) a random mutation process, i.e. transformations of a base R (respectively Y) into the base Y (respectively R) at random sites in these primitive genes. Following this model the problem investigated here is the study of the variation of the 8 R/Y codon probabilities RRR,..., YYY under random mutations. Two analytical expressions solved here allow analysis of this variation in the classical evolutionary sense (from the past to the present, i.e. after random mutations), but also in the inverted evolutionary sense (from the present to the past, i.e. before random mutations). Different properties are also derived from these formulae. Finally, a few applications of these formulae are presented. They prove the proposition in Arquès and Michel (1990b.Bull. math. Biol. 52, 741–772), Section 3.3.2, with the existence of a miximal mean number of random mutations per base of the order 0.3 in the protein coding genes. They also confirm the mixing process of oligonucleotides by excluding the purine/pyrimidine contiguous and alternating tracts from the formation process of primitive genes.  相似文献   

9.
Gal M  Katz T  Ovadia A  Yagil G 《Nucleic acids research》2003,31(13):3682-3685
A program to map the locations and frequencies of DNA tracts composed of only two bases ('Binary DNA') is described. The program, TRACTS (URL http://bioportal.weizmann.ac.il/tracts/tracts.html and/or http://bip.weizmann.ac.il/miwbin/servers/tracts) is of interest because long tracts composed of only two bases are highly over-represented in most genomes. In eukaryotes, oligopurine.oligopyrimidine tracts ('R.Y tracts') are found in the highest excess. In prokaryotes, W tracts predominate (A,T 'rich'). A pre-program, ANEX, parses database annotation files of GenBank and EMBL, to produce a convenient one-line list of every gene (exon, intron) in a genome. The main unit lists and analyzes tracts of the three possible binary pairs (R.Y, K.M and S;W). As an example, the results of R.Y tract mapping of mammalian gene p53 is described.  相似文献   

10.
Summary The tobacco (Nicotiana tabacum) nuclear genome contains long tracts of DNA (i.e. in excess of 18 kb) with high sequence homology to the tobacco plastid genome. Five lambda clones containing these nuclear DNA sequences encompass more than one-third of the tobacco plastid genome. The absolute size of these five integrants is unknown but potentially includes uninterrupted sequences that are as large as the plastid genome itself. An additional sequence was cloned consisting of both nuclear and plastid-derived DNA sequences. The nuclear component of the clone is part of a family of repeats, which are present in about 400 locations in the nuclear genome. The homologous sequences present in chromosomal DNA were very similar to those of the corresponding sequences in the plastid genome. However significant sequence divergence, including base substitutions, insertions and deletions of up to 41 bp, was observed between these nuclear sequences and the plastid genome. Associated with the larger deletions were sequence motifs suggesting that processes such as DNA replication slippage and excision of hairpin loops may have been involved in deletion formation.  相似文献   

11.
12.
The genetic architecture of resistance   总被引:13,自引:0,他引:13  
Plant resistance genes (R genes), especially the nucleotide binding site leucine-rich repeat (NBS-LRR) family of sequences, have been extensively studied in terms of structural organization, sequence evolution and genome distribution. These studies indicate that NBS-LRR sequences can be split into two related groups that have distinct amino-acid motif organizations, evolutionary histories and signal transduction pathways. One NBS-LRR group, characterized by the presence of a Toll/interleukin receptor domain at the amino-terminal end, seems to be absent from the Poaceae. Phylogenetic analysis suggests that a small number of NBS-LRR sequences existed among ancient Angiosperms and that these ancestral sequences diversified after the separation into distinct taxonomic families. There are probably hundreds, perhaps thousands, of NBS-LRR sequences and other types of R gene-like sequences within a typical plant genome. These sequences frequently reside in 'mega-clusters' consisting of smaller clusters with several members each, all localized within a few million base pairs of one another. The organization of R-gene clusters highlights a tension between diversifying and conservative selection that may be relevant to gene families that are unrelated to disease resistance.  相似文献   

13.
DNA fragments with the sequences d(gcGX[Y]n Agc) (n=1, X=A, and Y=A, T, or G)form base-intercalated duplexes, which is a basic unit for formation of multiplexes such as octaplex and hexaplex. To examine the stability of multiplexes, a DNA with X=Y=G and n=1 was crystallized under conditions different from those of the previously determined sequences, and its crystal structure has been determined. The two strands are coupled in an anti-parallel fashion to form a base-intercalated duplex, in which the first and second residues form Watson-Crick type G:C pairs and the third and sixth residues form a sheared G:A pairs at both ends of the duplex. The G4 and G5 bases are stacked alternatively on those of the counter strand to form a long G column of G3-G4-G5*-G5-G4*-G3*, the central four Gs being protruded. In addition, the three duplexes are associated to form a hexaplex around a mixture of calcium and sodium cations on the crystallographic threefold axis. These structural features are similar to those of the previous crystals, though slightly different in detail. The present study indicates that mutation at the 4th position is possible to occur in a base-intercalated duplex for multiplex formations, suggesting that DNA fragments with any sequence sandwiched between the two triplets gcG and Agc can form a multiplex.  相似文献   

14.
We isolated DNA clones of intracisternal A-particle (IAP) genes from the genome of an Asian wild mouse, Mus caroli. A typical M. caroli IAP gene was 6.5 kilobase pairs in length and had long terminal repeat (LTR) sequences at both ends. The size of the LTR was 345 base pairs in clone L20, and two LTRs at both ends of this clone were linked to directly repeating cellular sequences of 6 base pairs. Each LTR possessed most of the structural features commonly associated with the retrovirus LTR. The restriction map of the M. caroli IAP gene resembled that of Mus musculus, although the M. caroli IAP gene was 0.4 kilobase pairs shorter than the M. musculus IAP gene in two regions. Sequence homology between the M. caroli and M. musculus IAP LTRs was calculated as about 80%, whereas the LTR sequence of the Syrian hamster IAP gene was about 60% homologous to the M. caroli LTR. The reiteration frequency of the M. caroli IAP genes was estimated as 200 to 400 copies per haploid genome, which is at least 10 times the reported value. These results suggest that the IAP genes observed in the genus Mus are present in multiple copies with structures closely resembling the integrated retrovirus gene.  相似文献   

15.
Background Rhizoctonia solani is a pathogenic fungus that causes serious diseases in many crops, including rice, wheat, and soybeans. In crop production, it is very important to understand the pathogenicity of this fungus, which is still elusive. It might be helpful to comprehensively understand its genomic information using different genome annotation strategies.MethodsAiming to improve the genome annotation of R. solani, we performed a proteogenomic study based on the existing data. Based on our study, a total of 1060 newly identified genes, 36 revised genes, 139 single amino acid variants (SAAVs), 8 alternative splicing genes, and diverse post-translational modifications (PTMs) events were identified in R. solani AG3. Further functional annotation on these 1060 newly identified genes was performed through homology analysis with its 5 closest relative fungi.ResultsBased on this, 2 novel candidate pathogenic genes, which might be associated with pathogen-host interaction, were discovered. In addition, in order to increase the reliability and novelty of the newly identified genes in R. solani AG3, 1060 newly identified genes were compared with the newly published available R. solani genome sequences of AG1, AG2, AG4, AG5, AG6, and AG8. There are 490 homologous sequences. We combined the proteogenomic results with the genome alignment results and finally identified 570 novel genes in R. solani.ConclusionThese findings extended R. solani genome annotation and provided a wealth of resources for research on R. solani.  相似文献   

16.
RecA independent recombination of poly[d(GT)-d(CA)] in pBR322.   总被引:6,自引:2,他引:4       下载免费PDF全文
Short sequence tracts composed of alternating guanosine and thymidine nucleotide residues poly[d(GT)-d(CA)] carried in a derivative of pBR322 were recombinogenic in a recA host. Recombination brought about by poly[d(GT)-d(CA)] tracts displayed two interesting properties: (i) the reaction was quasi-sequence-specific in that while recombination usually occurred between two poly[d(GT)-d(CA)] tracts, recombination also occurred between sequences bordering the dinucleotide repeats. (ii) recombination was enhanced when two poly[d(GT)-d(CA)] tracts were clustered within 250 base pairs of each other, but not when the repeats were separated by 3 kilobase pairs. The mechanism by which poly[d(GT)-d(CA)] stimulated recombination remains to be determined, but the behavior of these sequences is consistent with the idea that general recombination in E. coli may involve formation of Z-DNA.  相似文献   

17.
18.
A recombinant phage, SpC3, containing a 17 kb genomic DNA insert representing approximately 60% of the 3' portion of the sheep collagen alpha 2 gene, was evaluated by electron microscopic R loop analysis. A minimum of 17 intervening sequences (introns) and 18 alpha 2 coding sequences (exons) were mapped. With the exception of the 850 base pair exon located at the extreme 3' end of the insert, all exons contained 250 base pairs or less. The total length of all the exons in SpC3 was 3,014 base pairs. The length distribution of the 17 introns ranged from 300 to 1600 base pairs; together, all of the introns comprised 14,070 base pairs of SpC3 DNA. Thus, the DNA region required for coding the interspersed 3 kb of alpha 2 collagen genetic information was 5.6 fold longer than the corresponding alpha 2 mRNA coding sequences.  相似文献   

19.
Trace sequences from the 2X alpaca genome sequencing effort were examined to identify simple sequence repeats (microsatellites) for genetic studies. A total of 6,685 repeat-containing sequences were downloaded from GenBank, processed, and assembled into contigs representing an estimated 4,278 distinct sequences. This sequence set contained 2,290 sequences of length > 100 nucleotides that contained microsatellites of length > or = 14 dinucleotide or 10 trinucleotide repeats with purity equal to 100%. An additional 13 sequences contained a GC microsatellite of length > or = 12 repeats (purity = 100%) were also obtained. Primer pairs for amplification of 1,516 putative loci are presented. Amplification of genomic DNA from alpaca and llama by PCR was demonstrated for 14 primer sets including one from each of the microsatellite repeat types. Comparative chromosomal location for the alpaca markers was predicted in the bovine genome by BLAT searches against assembly 4.0 of the bovine whole genome sequence. A total of 634 markers (41.8%) returned BLAT hits with score > 100 and Identity > 85%, with the majority assignable to unique locations. We show that microsatellites are abundant and easily identified within the alpaca genome sequence. These markers will provide a valuable resource for further genetic studies of the alpaca and related species.  相似文献   

20.
Polymerase chain reaction (PCR) amplification of DNA-based unique markers, the signature sequences, is ideal for rapid detection and identification of pathogens. We described the discovery of twenty-eight signature genes of Yersinia pestis by DNA microarray-based comparative genome hybridization in conjunction with PCR validation. Three pairs of Y. pestis-specific primers designed from signature genes were demonstrated to have the expected specificity to this target bacterium, without cross-reaction with the closely related Y. pseudotuberculosis or a large collection of genomic DNAs from other organisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号