共查询到20条相似文献,搜索用时 0 毫秒
1.
A new statistical method associating each trinucleotide with a frame is developed for identifying circular codes. Its sensibility allows the detection of several circular codes in the (protein coding) genes of archaeal genomes. Several properties of these circular codes are described, in particular the lengths of the minimal windows to retrieve the construction frames, a new definition of a parameter for measuring some probabilities of words generated by the circular codes, and the types of nucleotides in the trinucleotide sites. Some biological consequences are presented in Discussion. 相似文献
2.
Complete archaeal genomes were probed for the presence of long (> or = 25 bp) oligonucleotide repeats (words). We detected the presence of many words distributed in tandem with narrow ranges of periodicity (i.e., spacer length between repeats). Similar words were not identified in genomes of non-archaeal species, namely Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium and Mycoplasma pneumoniae. BLAST similarity searches against the GenBank nucleotide sequence database revealed that these words were archaeal species-specific, indicating that they are of a signature character. Sequence analysis and genome viewing tools showed these repeats to be restricted to non-coding regions. Thus, archaea appear to possess a non-coding genomic signature that is absent in bacterial species. The identification of a species-specific genomic signature would be of great value to archaeal genome mapping, evolutionary studies and analyses of genome complexity. 相似文献
3.
The genomes of Methanococcus jannaschii, Mycoplasma genitalium, Haemophilus influenzae, Archaeoglobus fulgidus, Helicobacter pylori, Treponema pallidum, Borrelia burgdorferri, Rickettsia prowazekeii, Mycobacterium tuberculosis, Methanobacterium thermoautotrophicum, Synechocystis sp. PCC6803, Bacillus subtilis, Chlamydia trachomatis, Pyrococcus horikoshii, Aquifex aeolicus, Mycoplasma pneumoniae and Escherichia coli have been analysed for the presence of polypurine.polypyrimidine tracts, in order to understand their distribution in these genomes. We observed a variation in abundance of such sequences in these bacteria, with the archaeal genomes forming a high-abundance group and the canonical eubacteria forming a low-abundance group. The genomes of M. tuberculosis and A. aeolicus are unique among the organisms analysed here in the abnormal underrepresentation and overrepresentation of polypurine.polypyrimidine, respectively. We also observe a strand bias, i.e., a preferential occurrence of polypurines in coding strands. It varies widely among the bacteria, from the very high bias in M. jannaschii to the slightly inverse bias in the parasitic genomes of T. pallidum and C. trachomatis. The extent of strand bias, however, cannot be explained on the basis of the GC-content of the genome, use of all-purine codons or an excess in the amino acids that are encoded by such codons. The probable causes and effects of this phenomenon are discussed. 相似文献
4.
5.
6.
7.
8.
A phylogeny of the extant Phocidae inferred from complete mitochondrial DNA coding regions 总被引:2,自引:0,他引:2
Davis CS Delisle I Stirling I Siniff DB Strobeck C 《Molecular phylogenetics and evolution》2004,33(2):363-377
Despite extensive interest in the systematics of Pinnipedia, questions remain concerning phylogenetic relationships within the Phocidae or "true" seals. Relationships within the phocids and their placement relative to the remaining pinnipeds and major lineages of arctoid carnivores were examined using a large molecular data set consisting of 12 mitochondrial protein coding genes. Phylogenetic analysis including 15 extant species of the Phocidae, and representatives of the Otariidae, Odobenidae, Ursidae, Mustelidae, Canidae, and Felidae confirmed the monophyletic origins of the Pinnipedia within the Arctoidea. Slightly more support was found for an ursid affinity of the pinnipeds, however, this relationship remains contentious. The Phocidae were placed as the sister group to a common odobenid-otariid clade. Within the family Phocidae, strong support for the traditionally accepted subfamilies Phocinae (northern seals), and Monachinae (southern seals plus monk seals) was found. In contrast to recent suggestions, a monophyletic Monachus was strongly supported and was placed in a deep branching position within the Monachinae. Evidence from sequence divergence under a maximum likelihood model illustrated that the rarely used tribal distinction within the Monachinae are comparable, in terms of evolutionary distance, to accepted tribal distinctions within the Phocinae. In addition, results suggest that Pagophilus should be accepted as a genus within the Phocini. Sequence divergence between Phoca, Pusa, and Halichoerus is minimal, supporting a taxonomic reclassification of the three genera into an emended genus Phoca, without subgeneric distinctions. 相似文献
9.
The ORFs of microbial genomes in annotation files are usually classified into two groups: the first corresponds to known genes; whereas the second includes 'putative', 'probable', 'conserved hypothetical', 'hypothetical', 'unknown' and 'predicted' ORFs etc. Since the annotation is not 100% accurate, it is essential to confirm which ORF of the latter group is coding and which is not. Starting from known genes in the former, this paper describes an improved Z curve method to recognize genes in the latter. Ten-fold cross-validation tests show that the average accuracy of the algorithm is greater than 99% for recognizing the known genes in 57 bacterial and archaeal genomes. The method is then applied to recognize genes of the latter group. The likely non-coding ORFs in each of the 57 bacterial or archaeal genomes studied here are recognized and listed at the website http://tubic.tju.edu.cn/ZCURVE_C_html/noncoding.html. The working mechanism of the algorithm has been discussed in details. A computer program, called ZCURVE_C, was written to calculate a coding score called Z-curve score for ORFs in the above 57 bacterial and archaeal genomes. Coding/non-coding is simply determined by the criterion of Z-curve score > 0/ Z-curve score < 0. A website has been set up to provide the service to calculate the Z-curve score. A user may submit the DNA sequence of an ORF to the server at http://tubic.tju.edu.cn/ZCURVE_C/Default.cgi, and the Z-curve score of the ORF is calculated and returned to the user immediately. 相似文献
10.
Alongside the well-studied membrane spanning helices, alpha-helical transmembrane (TM) proteins contain several functionally and structurally important types of substructures. Here, existing 3D structures of transmembrane proteins have been used to define and study the concept of reentrant regions, i.e. membrane penetrating regions that enter and exit the membrane on the same side. We find that these regions can be divided into three distinct categories based on secondary structure motifs, namely long regions with a helix-coil-helix motif, regions of medium length with the structure helix-coil or coil-helix and regions of short to medium length consisting entirely of irregular secondary structure. The residues situated in reentrant regions are significantly smaller on average compared to other regions and reentrant regions can be detected in the inter-transmembrane loops with an accuracy of approximately 70% based on their amino acid composition. Using TOP-MOD, a novel method for predicting reentrant regions, we have scanned the genomes of Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The results suggest that more than 10% of transmembrane proteins contain reentrant regions and that the occurrence of reentrant regions increases linearly with the number of transmembrane regions. Reentrant regions seem to be most commonly found in channel proteins and least commonly in signal receptors. 相似文献
11.
Background
Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs). 相似文献12.
Vertebrate genomes are mosaics of isochores. On the assumption that marked differences exist in the isochore structure between warm-blooded and cold-blooded animals, variations among vertebrates were previously attributed to adaptation to homeothermy. However, based on the data of coding regions from representatives of extant vertebrates, including a turtle, a crocodile (Archosauromorpha) and a few kinds of snakes (Lepidosauromorpha), it was recently hypothesized that the common ancestors of mammals, birds and extant reptiles already had the "warm-blooded" isochore structure. To test this hypothesis, the nucleotide sequences of alpha-globin genes including non-coding regions (introns) from two snakes, N. kaouthia and E. climacophora, were determined (accession number: AB104824, AB104825). The correlation between the GC contents in the introns and exons of alpha-globin genes from snakes and those from other vertebrates supports the above hypothesis. Similar analysis using data for exons and introns of other genes obtained from the GenBank (Release 131) also support the above hypothesis. 相似文献
13.
14.
Compound microsatellites consisting of two or more repeats in close proximity have been found in eukaryotic genomes. So far such compound microsatellites have not been investigated in any prokaryotic genomes. We have therefore examined compound microsatellites in 22 complete genomes of Escherichia coli, which is one of the ideal model organisms to analyze the nature and evolution of prokaryotic compound microsatellites. Our results indicated that about 1.75-2.85% of all microsatellites could be accounted as compound microsatellites with very low complexity, and most compound microsatellites were composed of very different motifs. Compound microsatellites were significantly overrepresented in all surveyed genomes. These results were dramatically different from those in eukaryotes. We discussed the possible reasons for the observed divergence. 相似文献
15.
Rapid genome evolution revealed by comparative sequence analysis of orthologous regions from four triticeae genomes 总被引:23,自引:0,他引:23
Bread wheat (Triticum aestivum) is an allohexaploid species, consisting of three subgenomes (A, B, and D). To study the molecular evolution of these closely related genomes, we compared the sequence of a 307-kb physical contig covering the high molecular weight (HMW)-glutenin locus from the A genome of durum wheat (Triticum turgidum, AABB) with the orthologous regions from the B genome of the same wheat and the D genome of the diploid wheat Aegilops tauschii (Anderson et al., 2003; Kong et al., 2004). Although gene colinearity appears to be retained, four out of six genes including the two paralogous HMW-glutenin genes are disrupted in the orthologous region of the A genome. Mechanisms involved in gene disruption in the A genome include retroelement insertions, sequence deletions, and mutations causing in-frame stop codons in the coding sequences. Comparative sequence analysis also revealed that sequences in the colinear intergenic regions of these different genomes were generally not conserved. The rapid genome evolution in these regions is attributable mainly to the large number of retrotransposon insertions that occurred after the divergence of the three wheat genomes. Our comparative studies indicate that the B genome diverged prior to the separation of the A and D genomes. Furthermore, sequence comparison of two distinct types of allelic variations at the HMW-glutenin loci in the A genomes of different hexaploid wheat cultivars with the A genome locus of durum wheat indicates that hexaploid wheat may have more than one tetraploid ancestor. 相似文献
16.
REPuter: fast computation of maximal repeats in complete genomes. 总被引:12,自引:0,他引:12
SUMMARY: A software tool was implemented that computes exact repeats and palindromes in entire genomes very efficiently. AVAILABILITY: Via the Bielefeld Bioinformatics Server (http://bibiserv.techfak.uni-bielefeld.de/rep uter/). 相似文献
17.
Organelle genomics has become an increasingly important research field, with applications in molecular modeling, phylogeny, taxonomy, population genetics and biodiversity. Typically, research projects involve the determination and comparative analysis of complete mitochondrial and plastid genome sequences, either from closely related species or from a taxonomically broad range of organisms. Here, we describe two alternative organelle genome sequencing protocols. The "random genome sequencing" protocol is suited for the large majority of organelle genomes irrespective of their size. It involves DNA fragmentation by shearing (nebulization) and blunt-end cloning of the resulting fragments into pUC or BlueScript-type vectors. This protocol excels in randomness of clone libraries as well as in time and cost-effectiveness. The "long-PCR-based genome sequencing" protocol is specifically adapted for DNAs of low purity and quantity, and is particularly effective for small organelle genomes. Library construction by either protocol can be completed within 1 week. 相似文献
18.
Chattopadhyay S Sahoo S Kanner WA Chakrabarti J 《Comparative and Functional Genomics》2003,4(1):56-65
Our studies on the bases of codons from 11 completely sequenced archaeal genomes show that, as we move from GC-rich to AT-rich protein-coding gene-containing species, the differences between G and C and between A and T, the purine load (AG content), and also the overall persistence (i.e. the tendency of a base to be followed by the same base) within codons, all increase almost simultaneously, although the extent of increase is different over the three positions within codons. These findings suggest that the deviations from the second parity rule (through the increasing differences between complementary base contents) and the increasing purine load hinder the chance of formation of the intra-strand Watson-Crick base-paired secondary structures in mRNAs (synonymous with the protein-coding genes we dealt with), thereby increasing the translational efficiency. We hypothesize that the ATrich protein-coding gene-containing archaeal species might have better translational efficiency than their GC-rich counterparts. 相似文献
19.
S A Shabalina A Y Ogurtsov V A Kondrashov A S Kondrashov 《Trends in genetics : TIG》2001,17(7):373-376
We aligned and analyzed 100 pairs of complete, orthologous intergenic regions from the human and mouse genomes (average length approximately 12 000 nucleotides). The alignments alternate between highly similar segments and dissimilar segments, indicating a wide variation of selective constraint. The average number of selectively constrained nucleotides within a mammalian intergenic region is at least 2000. This is threefold higher than within a nematode intergenic region and at least twofold higher than the number of selectively constrained nucleotides coding for an average protein. Because mammals possess only two- to threefold more proteins than Caenorhabditis elegans, the higher complexity of mammals might be primarily because of the functioning of intergenic DNA. 相似文献
20.
The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., both the Z-curve and the given DNA sequence can be uniquely reconstructed from the other. We employed Z-curve analysis to identify one replication origin in the Methanocaldococcus jannaschii genome, two replication origins in the Halobacterium species NRC-1 genome and one replication origin in the Methanosarcina mazei genome. One of the predicted replication origins of Halobacterium species NRC-1 is the same as a replication origin later identified by in vivo experiments. The Z-curve analysis of the Sulfolobus solfataricus P2 genome suggested the existence of three replication origins, which is also consistent with later experimental results. This review aims to summarize applications of the Z-curve in identifying replication origins of archaeal genomes, and to provide clues about the locations of as yet unidentified replication origins of the Aeropyrum pernix K1, Methanococcus maripaludis S2, Picrophilus torridus DSM 9790 and Pyrobaculum aerophilum str. IM2 genomes. 相似文献