首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
ABSTRACT. A fragment from the genome of rat-derived Pneumocystis carinii was found to contain two MSG genes arranged as a direct repeat. The sequences from one gene (MSG B), the region between the two genes, and part of the second gene (MSG A) were determined. The two MSG genes were not identical in sequence. The open reading frames of MSG A and MSG B encode non-identical proteins, both of which are similar to that encoded by a previously published cDNA. The MSG B gene sequence showed no evidence of introns. The 5'and 3'untranslated regions of the MSG gene pair were highly conserved, but the regions immediately upstream of the open reading frames of MSG A and B were different from the region upstream of a previously characterized MSG cDNA. Primers designed to extend upstream of the 5'end of MSG and downstream of the 3'end of MSG were used in a polymerase chain reaction with total genomic P. carinii DNA as template. Presumptive intergenic amplification products from this reaction were cloned and sequenced. The sequences of these regions were similar but distinct, indicating that tandem arrangement of MSG genes is a common organizational motif.  相似文献   

2.
Thalassiosira weissflogii (Grun.) Fryxell et Hasle is one of the more commonly studied centric diatoms, and yet molecular studies of this organism are still in their infancy. The ability to identify open reading frames and thus distinguish between introns and exons, coding and noncoding sequence is essential to move from nuclear DNA sequences to predicted amino acid sequences. To facilitate the identification of open reading frames in T. weissflogii , two newly identified nuclear genes encoding β-tubulin and t  -complex polypeptide (TCP)-γ, along with six previously published nuclear DNA sequences, were examined for general structural features. The coding region of the nuclear open reading frames had a G + C content of about 49% and could readily be distinguished from noncoding sequence due to a significant difference in G + C content. The introns were uniformly small, about 100 base pairs in size. Furthermore, the 5' and 3' splice sites of introns displayed the canonical GT/AG sequence, further facilitating recognition of noncoding regions. Six of the nuclear open reading frames displayed relatively little bias in the use of synonymous codons, as exemplified by the cDNAs encoding β-tubulin and TCP-γ. Two open reading frames displayed strong bias in the use of particular codons (although the codons used were different), as exemplified by the cDNA encoding fucoxanthin chlorophyll a/c binding protein. Knowledge of codon bias should facilitate, for example, design of degenerate PCR primers and potential heterologous reporter gene constructs.  相似文献   

3.
The aquatic larvae of the genus Chironomus (Diptera, Insecta) contain at least 12 different hemoglobin (Hb) variants in their hemolymph. In the present study we have analysed the structure and part of the nucleotide sequence of a Hb gene cluster cloned from the genomic DNA of Chironomus thummi piger. The cluster contains probably 6 different genes, separated by intergenic regions of various lengths. The nucleotide sequence of three putative Hb genes including the intergenic regions is presented. The inferred amino-acid sequences show clearly that two of these putative genes code for subvariants of the Hb variant VIIB. The third gene codes for a so far unknown Hb protein. As known already for other chironomid Hb genes, there are no intron sequences present in the coding regions.  相似文献   

4.
Small repeat sequences in bacterial genomes, which represent non-autonomous mobile elements, have close similarities to archaeon and eukaryotic miniature inverted repeat transposable elements. These repeat elements are found in both intergenic and intragenic chromosomal regions, and contain an array of diverse motifs. These can include DNA sequences containing an integration host factor binding site and a proposed DNA methyltransferase recognition site, transcribed RNA secondary structural motifs, which are involved in mRNA regulation, and translated open reading frames found fused to other open reading frames. Some bacterial mobile element fusions are in evolutionarily conserved protein and RNA genes. Others might represent or lead to creation of new protein genes. Here we review the remarkable properties of these small bacterial mobile elements in the context of possible beneficial roles resulting from random insertions into the genome.  相似文献   

5.
Abstract The lactate dehydrogenase gene, ldh , of Alcaligenes eutrophus H16 was identified on a 14-kbp Eco RI restriction fragment of a genomic library in the cosmid pHC79 by hybridization with a 50-mer synthetic oligonucleotide which was derived from the N-terminal amino acid sequence of the purified enzyme. Recombinant strains of Escherichia coli JM83, which harboured a 2.0-kbp Pst I subfragment in pUC9-1, expressed LDH at a high level, if ldh was downstream from and colinear to the E. coli lac promoter. The nucleotide sequence of a region of 4245 bp revealed several open reading frames which might represent coding regions. One represented the ldh gene. The amino acid sequence deduced from ldh exhibited 29% and 36% identity to the L-malate dehydrogenase of Methanothermus fervidus and to the putative translation product of an E. coli sequence of unknown function, respectively. The ldh was separated by short intergenic regions from two other open reading frames: ORF5 was located downstream of and colinear to ldh , and its putative translational product revealed 38 to 56% amino acid identity to penicillin-binding proteins. ORF3 was located upstream of and colinear to ldh , and its putative gene translational product represented a hydrophobic protein. A sequence, which resembled the A. eutrophus alcohol dehydrogenase promoter, was detected upstream of ORF3, which most probably represents the first transcribed gene of an operon consisting of ORF3, ldh and ORF5.  相似文献   

6.
Genome annotation in differently evolved organisms presents challenges because the lack of sequence-based homology limits the ability to determine the function of putative coding regions. To provide an alternative to annotation by sequence homology, we developed a method that takes advantage of unusual trypanosomatid biology and skews in nucleotide composition between coding regions and upstream regions to rank putative open reading frames based on the likelihood of coding. The method is 93% accurate when tested on known genes. We have applied our method to the full complement of open reading frames on Chromosome I of Trypanosoma brucei, and we can predict with high confidence that 226 putative coding regions are likely to be functional. Methods such as the one described here for discriminating true coding regions are critical for genome annotation when other sources of evidence for function are limited.  相似文献   

7.
The flanking regions and the end of the chloroplast ribosomal unit of Chlamydomonas reinhardii have been sequenced. The upstream region of the ribosomal unit contains three open reading frames coding for 111, 117 and 124 amino acids, respectively. The latter polypeptide is partially related to the ribosomal protein L16 of E. coli. Two of the open reading frames overlap each other and are oriented in opposite direction. The region between these open reading frames and the 5' end of the 16S rRNA gene contains numerous short direct and inverted repeats which can be folded into large stem-loop structures. Sequence elements that resemble prokaryotic promoters are found in the same region. Several of the repeated elements are distributed throughout the non-coding regions of the chloroplast inverted repeat. Sequence comparison between the 5S rRNA and its gene does not reveal any significant sequence heterogeneity between the chloroplast 5S rRNA genes.  相似文献   

8.
9.
10.
11.
F Rodier  J Sallantin 《Biochimie》1985,67(5):533-539
Learning processes are applied to the recognition of protein coding regions in prokaryotes. Non-contradictory, statistical and logical rules are deduced from a set of known examples of coding sequences. These rules enable to build characteristic patterns on the m-RNA upstream of the initiating codon. These rules are applied with success to recognize more than 180 coding sequences and to detect and/or eliminate hypothetical reading frames or unknown genes.  相似文献   

12.
13.
14.
MOTIVATION: The whole genomes submitted to GenBank contain valuable information about the function of genes as well as the upstream sequences and whole cell expression provides valuable information on gene regulation. To utilize these large amounts of data for a biological understanding of the regulation of gene expression, new automatic methods for pattern finding are needed. RESULTS: Two word-analysis algorithms for automatic discovery of regulatory sequence elements have been developed. We show that sequence patterns correlated to whole cell expression data can be found using Kolmogorov-Smirnov tests on the raw data, thereby eliminating the need for clustering co-regulated genes. Regulatory elements have also been identified by systematic calculations of the significance of correlations between words found in the functional annotation of genes and DNA words occurring in their promoter regions. Application of these algorithms to the Saccharomyces cerevisiae genome and publicly available DNA array data sets revealed a highly conserved 9-mer occurring in the upstream regions of genes coding for proteasomal subunits. Several other putative and known regulatory elements were also found. AVAILABILITY: Upon request.  相似文献   

15.
16.
The genes coding for the lactose permease and beta-galactosidase, two proteins involved in the metabolism of lactose by Lactobacillus bulgaricus, have been cloned, expressed, and found functional in Escherichia coli. The nucleotide sequences of these genes and their flanking regions have been determined, showing the presence of two contiguous open reading frames (ORFs). One of these ORFs codes for the lactose permease gene, and the other codes for the beta-galactosidase gene. The lactose permease gene is located in front of the beta-galactosidase gene, with 3 bp in the intergenic region. The two genes are probably transcribed as one operon. Primer extension studies have mapped a promoter upstream from the lactose permease gene but not the beta-galactosidase gene. This promoter is similar to those found in E. coli with general characteristics of GC-rich organisms. In addition, the sequences around the promoter contain a significantly higher number of AT base pairs (80%) than does the overall L. bulgaricus genome, which is rich in GC (GC content of 54%). The amino acid sequences obtained from translation of the ORFs are found to be highly homologous (similarity of 75%) to those from Streptococcus thermophilus. The first 460 amino acids of the lactose permease shows homology to the melibiose transport protein of E. coli. Little homology was found between the lactose permease of L. bulgaricus and E. coli, but the residues which are involved in the binding and the transport of lactose are conserved. The carboxy terminus is similar to that of the enzyme III of several phosphoenolpyruvate-dependent phosphotransferase systems.  相似文献   

17.
Sequence organization of the mitochondrial genome of yeast--a review   总被引:3,自引:0,他引:3  
M de Zamaroczy  G Bernardi 《Gene》1985,37(1-3):1-17
We have compiled the available primary structural data for the mitochondrial genome of Saccharomyces cerevisiae and have estimated the size of the remaining gaps, which represent 12-13% of the genome. The lengths of sequenced regions and of gaps lead to a new assessment of genome sizes; these range (in round figures) from 85 000 bp for the long genomes, to 78 000 bp for the short genomes, to 74 000 bp for the supershort genome of Saccharomyces carlsbergensis. These values are 8-11% higher than those previously estimated from restriction fragments. Interstrain differences concern not only facultative intervening sequences (introns) and mini-inserts, but also insertions/deletions in intergenic sequences. The primary structure appears to be extremely conserved in genes and ori sequences, and highly conserved in intergenic sequences. Since coding sequences represent at most 33-35% of the genome, at least two thirds of the genome are formed by noncoding and yet highly conserved sequences. The G + C level of genes or exon is 25%, and that of intronic open reading frames (ORFs) 22%; increasingly lower values are shown by intronic closed reading frames (CRFs), 20%, ori sequences, 19%, intergenic ORFs, 17.5% and intergenic sequences, 15%.  相似文献   

18.
19.
The vast majority of bacteria in the environment have yet to be cultured. Consequently, a major proportion of both genetic diversity within known gene families and an unknown number of novel gene families reside in these uncultured organisms. Isolation of these genes is limited by lack of sequence information. Where such sequence data exist, PCR directed at conserved sequence motifs recovers only partial genes. Here we outline a strategy for recovering complete open reading frames from environmental DNA samples. PCR assays were designed to target the 59-base element family of recombination sites that flank gene cassettes associated with integrons. Using such assays, diverse gene cassettes could be amplified from the vast majority of environmental DNA samples tested. These gene cassettes contained complete open reading frames, the majority of which were associated with ribosome binding sites. Novel genes with clear homologies to phosphotransferase, DNA glycosylase, methyl transferase, and thiotransferase genes were identified. However, the majority of amplified gene cassettes contained open reading frames with no identifiable homologues in databases. Accumulation analysis of the gene cassettes amplified from soil samples showed no signs of saturation, and soil samples taken at 1-m intervals along transects demonstrated different amplification profiles. Taken together, the genetic novelty, steep accumulation curves, and spatial heterogeneity of genes recovered show that this method taps into a vast pool of unexploited genetic diversity. The success of this approach indicates that mobile gene cassettes and, by inference, integrons are widespread in natural environments and are likely to contribute significantly to bacterial diversity.  相似文献   

20.
A sequence of 10,621 base-pairs from the alpha-like globin gene cluster of rabbit has been determined. It includes the sequence of gene zeta 1 (a pseudogene for the rabbit embryonic zeta-globin), the functional rabbit alpha-globin gene, and the theta 1 pseudogene, along with the sequences of eight C repeats (short interspersed repeats in rabbit) and a J sequence implicated in recombination. The region is quite G + C-rich (62%) and contains two CpG islands. As expected for a very G + C-rich region, it has an abundance of open reading frames, but few of the long open reading frames are associated with the coding regions of genes. Alignments between the sequences of the rabbit and human alpha-like globin gene clusters reveal matches primarily in the immediate vicinity of genes and CpG islands, while the intergenic regions of these gene clusters have many fewer matches than are seen between the beta-like globin gene clusters of these two species. Furthermore, the non-coding sequences in this portion of the rabbit alpha-like globin gene cluster are shorter than in human, indicating a strong tendency either for sequence contraction in the rabbit gene cluster or for expansion in the human gene cluster. Thus, the intergenic regions of the alpha-like globin gene clusters have evolved in a relatively fast mode since the mammalian radiation, but not exclusively by nucleotide substitution. Despite this rapid mode of evolution, some strong matches are found 5' to the start sites of the human and rabbit alpha genes, perhaps indicating conservation of a regulatory element. The rabbit J sequence is over 1000 base-pairs long; it contains a C repeat at its 5' end and an internal region of homology to the 3'-untranslated region of the alpha-globin gene. Part of the rabbit J sequence matches with sequences within the X homology block in human. Both of these regions have been implicated as hot-spots for recombination, hence the matching sequences are good candidates for such a function. All the interspersed repeats within both gene clusters are retroposon SINEs that appear to have inserted independently in the rabbit and human lineages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号