首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Homma K  Fukuchi S  Kawabata T  Ota M  Nishikawa K 《Gene》2002,294(1-2):25-33
Pseudogenes are open reading frames (ORFs) encoding dysfunctional proteins with high homology to known protein-coding genes. Although pseudogenes were reported to exist in the genomes of many eukaryotes and bacteria, no systematic search for pseudogenes in the Escherichia coli genome has been carried out. Genome comparisons of E. coli strains K-12 and O157 revealed that many protein-coding sequences have prematurely terminated orthologs encoding unstable proteins. To systematically screen for pseudogenes, we selected ORFs generated by premature termination of the orthologous protein-coding genes and subsequently excluded those possibly arising from sequence errors. Lastly we eliminated those with close homologs in this and other species, as these shortened ORFs may actually have functions. The process produced 95 and 101 pseudogene candidates in K-12 and O157, respectively. The assigned three-dimensional structures suggest that most of the encoded proteins cannot fold properly and thus are dysfunctional, indicating that they are probably pseudogenes. Therefore, the existence of a significant number of probable pseudogenes in E. coli is predicted, awaiting experimental verification. Most of them were found to be genes with paralogs or horizontally transferred genes or both. We suggest that pseudogenes constitute a small fraction of the genomes of free-living bacteria in general, reflecting the faster elimination than production of pseudogenes.  相似文献   

2.
The published sequence of the Vibrio cholerae genome indicates that, in addition to the genes that encode proteins of known and unknown function, there are 1577 ORFs identified as conserved hypothetical or hypothetical gene candidates. Because the annotation is not 100% accurate, it is not known which of the 1577 ORFs are true protein-coding genes. In this paper, an algorithm based on the Z curve method, with sensitivity, specificity and accuracy greater than 98%, is used to solve this problem. Twenty-fold cross-validation tests show that the accuracy of the algorithm is 98.8%. A detailed discussion of the mechanism of the algorithm is also presented. It was found that 172 of the 1577 ORFs are unlikely to be protein-coding genes. The number of protein-coding genes in the V. cholerae genome was re-estimated and found to be approximately 3716. This result should be of use in microarray analysis of gene expression in the genome, because the cost of preparing chips may be somewhat decreased. A computer program was written to calculate a coding score called VCZ for gene identification in the genome. Coding/noncoding is simply determined by VCZ > 0/VCZ < 0. The program is freely available on request for academic use.  相似文献   

3.
Naoyuki Iwabe  Takashi Miyata 《Gene》2001,280(1-2):163-167
The parasitic protist Giardia lamblia lacks mitochondria and peroxisomes, as well as many typical membrane-bound organella characteristics of higher eukaryotic cells, together with extremely economized usage of DNA sequence, as demonstrated by the lack of introns. We describe here the presence of overlapping genes in G. lamblia, in which a part of the protein coding sequence of one mRNA exists in a region corresponding to the 3′-noncoding region of another mRNA transcribed from a gene on the opposite strand. Recently we isolated 13 kinesin-related cDNAs from G. lamblia. Nine of these cDNAs contain long 3′-noncoding sequences in which long open reading frames (ORFs) exist (in the remaining four cDNAs, the lengths of the 3′-noncoding sequences are very short). The predicted amino acid sequences of these ORFs were subjected to a search for homologies with sequences in databases. The amino acid sequences of the six ORFs exhibited significant sequence similarities with known sequences. These lines of evidence suggest the frequent occurrence of gene overlap in Giardial genome.  相似文献   

4.
The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (http://www.bio.nite.go.jp/E-home/genome_list-e.html/).  相似文献   

5.
In this study, the full mitochondrial genome of a basidiomycete fungus, Pleurotus ostreatus, was sequenced and analyzed. It is a circular DNA molecule of 73 242 bp and contains 44 known genes encoding 18 proteins and 26 RNA genes. The protein-coding genes include 14 common mitochondrial genes, one ribosomal small subunit protein 3 gene, one RNA polymerase gene and two DNA polymerase genes. In addition, one RNA and one DNA polymerase genes were identified in a mitochondrial plasmid. These two genes show relatively low similarities to their homologs in the mitochondrial genome but they are nearly identical to the known mitochondrial plasmid genes from another Pleurotus ostreatus strain. This suggests that the plasmid may mediate the horizontal gene transfer of the DNA and RNA polymerase genes into mitochondrial genome, and such a transfer may be an ancient event. Phylogenetic analysis based on the cox1 ORFs verified the traditional classification of Pleurotus ostreatus among fungi. However, the discordances were observed in the phylogenetic trees based on the six cox1 intronic ORFs of Pleurotus ostreatus and their homologs in other species, suggesting that these intronic ORFs are foreign DNA sequences obtained through HGT. In summary, this analysis provides valuable information towards the understanding of the evolution of fungal mtDNA.  相似文献   

6.
7.
To explore the mitochondrial genes of the Cruciferae family, the mitochondrial genome of Raphanus sativus (sat) was sequenced and annotated. The circular mitochondrial genome of sat is 239,723 bp and includes 33 protein-coding genes, three rRNA genes and 17 tRNA genes. The mitochondrial genome also contains a pair of large repeat sequences 5.9 kb in length, which may mediate genome reorga-nization into two sub-genomic circles, with predicted sizes of 124.8 kb and 115.0 kb, respectively. Furthermore, gene evolution of mitochondrial genomes within the Cruciferae family was analyzed using sat mitochondrial type (mitotype), together with six other re-ported mitotypes. The cruciferous mitochondrial genomes have maintained almost the same set of functional genes. Compared with Cycas taitungensis (a representative gymnosperm), the mitochondrial genomes of the Cruciferae have lost nine protein-coding genes and seven mitochondrial-like tRNA genes, but acquired six chloroplast-like tRNAs. Among the Cruciferae, to maintain the same set of genes that are necessary for mitochondrial function, the exons of the genes have changed at the lowest rates, as indicated by the numbers of single nucleotide polymorphisms. The open reading frames (ORFs) of unknown function in the cruciferous genomes are not conserved. Evolutionary events, such as mutations, genome reorganizations and sequence insertions or deletions (indels), have resulted in the non- conserved ORFs in the cruciferous mitochondrial genomes, which is becoming significantly different among mitotypes. This work represents the first phylogenic explanation of the evolution of genes of known function in the Cruciferae family. It revealed significant variation in ORFs and the causes of such variation.  相似文献   

8.
9.
10.
11.
The complete sequence of the mitochondrial DNA (mtDNA) of the true slime mold Physarun polycephalum has been determined. The mtDNA is a circular 62,862-bp molecule with an A+T content of 74.1%. A search with the program BLAST X identified the protein-coding regions. The mitochondrial genome of P. polycephalum was predicted to contain genes coding for 12 known proteins [for three cytochrome c oxidase subunits, apocytochrome b, two F1Fo-ATPase subunits, five NADH dehydrogenase (nad) subunits, and one ribosomal protein], two rRNA genes, and five tRNA genes. However, the predicted ORFs are not all in the same frame, because mitochondrial RNA in P. polycephalum undergoes RNA editing to produce functional RNAs. The nucleotide sequence of an nad7 cDNA showed that 51 nucleotides were inserted at 46 sites in the mRNA. No guide RNA-like sequences were observed in the mtDNA of P. polycephalum. Comparison with reported Physarum mtDNA sequences suggested that sites of RNA editing vary among strains. In the Physarum mtDNA, 20 ORFs of over 300 nucleotides were found and ORFs 14 19 are transcribed.  相似文献   

12.
13.
Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements.  相似文献   

14.
Handa H 《Nucleic acids research》2003,31(20):5907-5916
The entire mitochondrial genome of rapeseed (Brassica napus L.) was sequenced and compared with that of Arabidopsis thaliana. The 221 853 bp genome contains 34 protein-coding genes, three rRNA genes and 17 tRNA genes. This gene content is almost identical to that of Arabidopsis. However the rps14 gene, which is a pseudo-gene in Arabidopsis, is intact in rapeseed. On the other hand, five tRNA genes are missing in rapeseed compared to Arabidopsis, although the set of mitochondrially encoded tRNA species is identical in the two Cruciferae. RNA editing events were systematically investigated on the basis of the sequence of the rapeseed mitochondrial genome. A total of 427 C to U conversions were identified in ORFs, which is nearly identical to the number in Arabidopsis (441 sites). The gene sequences and intron structures are mostly conserved (more than 99% similarity for protein-coding regions); however, only 358 editing sites (83% of total editings) are shared by rapeseed and Arabidopsis. Non-coding regions are mostly divergent between the two plants. One-third (about 78.7 kb) and two-thirds (about 223.8 kb) of the rapeseed and Arabidopsis mitochondrial genomes, respectively, cannot be aligned with each other and most of these regions do not show any homology to sequences registered in the DNA databases. The results of the comparative analysis between the rapeseed and Arabidopsis mitochondrial genomes suggest that higher plant mitochondria are extremely conservative with respect to coding sequences and somewhat conservative with respect to RNA editing, but that non-coding parts of plant mitochondrial DNA are extraordinarily dynamic with respect to structural changes, sequence acquisition and/or sequence loss.  相似文献   

15.
The distribution patterns of bases of DNA fragments in different regions in P. aeruginosa genome are analyzed in this paper. It's shown that 5565 protein-coding genes, 17315 non-coding ORFs, and 1104 intergenic sequences are located into seven clusters based on their base frequencies. Almost all the protein-coding genes are contained in one of the seven clusters. The significant difference of base frequencies among three codon positions in high GC genome, which arouse the division between the distribution patterns of bases of six reading frames of protein-coding genes, is responsible for the appearance of the clustering phenomenon. In the light of the clustering phenomenon, the author supposes that the anitisense strand ORFs, particularly those corresponding to Frame 2' and Frame 3', may not code for proteins in P. aeruginosa genome.  相似文献   

16.
17.
18.
Shao R  Barker SC 《Gene》2011,473(1):36-43
The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse.  相似文献   

19.
While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process.   相似文献   

20.
Intergenic sequences represent 63% of the mitochondrial 'long' (85 kb) genome of Saccharomyces cerevisiae. They comprise 170-200 AT spacers that correspond to 47% of the genome and are separated from each other by GC clusters, ORFs, ori sequences, as well as by protein-coding genes. Intergenic AT spacers have an average size of 190 bp, and a GC level of 5%; they are formed by short (20-30 nt on the average) A/T stretches separated by C/G mono- to trinucleotides. An analysis of the primary structures of all intergenic AT spacers already sequenced (32 kb; 80% of the total) has shown that they are characterized by an extremely high level of short sequence repetitiveness and by a characteristic sequence pattern; the frequencies of A/T isostichs conspicuously deviate from statistical expectations, and exponentially decrease when their (AT + TA)/(AA + TT) ratio, R, decreases. A situation basically identical was found in the AT spacers of the mitochondrial genome (19 kb) of Torulopsis glabrata. The sequence features of the AT spacers indicate that they were built in evolution by an expansion process mainly involving rounds of duplication, inversion and translocation events which affected an initial oligodeoxynucleotide (endowed with a particular R ratio) and the sequences derived from it. In turn, the initial oligodeoxynucleotide appears to have arisen from an ancestral promoter-replicator sequence which was at the origin of the nonanucleotide promoters present in the mitochondrial genomes of several yeasts. Common sequence patterns indicate that the AT spacers so formed gave rise to the var1 gene (by linking and phasing of short ORFs), to the DNA stretches corresponding to the untranslated mRNA sequences and to the central stretches of ori sequences from S. cerevisiae.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号