首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a “genomic signature.” The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0th order Markov model) as well as genomic signatures normalized by smaller DNA words (1st and 2nd order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal Findings

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.  相似文献   

2.
The global feature of the completely sequenced Alcanivorax borkumensis SK2 type strain chromosome is its symmetry and homogeneity. The origin and terminus of replication are located opposite to each other in the chromosome and are discerned with high signal to noise ratios by maximal oligonucleotide usage biases on the leading and lagging strand. Genomic DNA structure is rather uniform throughout the chromosome with respect to intrinsic curvature, position preference or base stacking energy. The orthologs and paralogs of A. borkumensis genes with the highest sequence homology were found in most cases among γ-Proteobacteria, with Acinetobacter and P. aeruginosa as closest relatives. A. borkumensis shares a similar oligonucleotide usage and promoter structure with the Pseudomonadales. A comparatively low number of only 18 genome islands with atypical oligonucleotide usage was detected in the A. borkumensis chromosome. The gene clusters that confer the assimilation of aliphatic hydrocarbons, are localized in two genome islands which were probably acquired from an ancestor of the Yersinia lineage, whereas the alk genes of Pseudomonas putida still exhibit the typical Alcanivorax oligonucleotide signature indicating a complex evolution of this major hydrocarbonoclastic trait.  相似文献   

3.
MOTIVATION: Some genomic islands contain horizontally transferred genes, which play critical roles in altering the genotypes and phenotypes of organisms, and horizontal gene transfer has been recognized as a universal event throughout bacterial evolution. A windowless method to display the distribution of genomic GC content, the cumulative GC profile, is proposed to identify genomic islands in genomes whose complete genome sequences are available. Two new indices are proposed to assess the codon usage bias and amino acid usage bias in genomic islands. RESULTS: A 211 kb genomic island (CGGI-1) has been identified in the genome of Corynebacterium glutamicum, and three genomic islands VVGI-1, VVGI-2 and VVGI-3, with lengths 167, 40 and 33 kb, respectively, have been identified in the genome of Vibrio vulnificus CMCP6 chromosome I. The CGGI-1 is flanked by two approximately 500 bp direct repeats, and utilizes a Val-tRNA as the integration site. For the VVGI-1 and VVGI-2, each has an integrase gene at 5' junction. All the identified genomic islands show unusual GC content, codon usage and amino acid usage, compared with the rest of the genomes. In addition, it is found that genomic islands are fairly homogenous in terms of GC content variation. An index, h, to quantify the homogeneity of GC content for genomic islands is proposed, and it is shown that h is less than 0.1 for all the genomic islands analyzed. The cumulative GC profile, as well as various indices to assess the codon usage bias, amino acid usage bias and homogeneity of the genomic islands, will be useful in the analysis of other genomes. AVAILABILITY: Programs used in this work and numerical results are available upon request.  相似文献   

4.
A gene in a genome is defined as putative alien (pA) if its codon usage difference from the average gene exceeds a high threshold and codon usage differences from ribosomal protein genes, chaperone genes and protein-synthesis-processing factors are also high. pA gene clusters in bacterial genomes are relevant for detecting genomic islands (GIs), including pathogenicity islands (PAIs). Four other analyses appropriate to this task are G+C genome variation (the standard method); genomic signature divergences (dinucleotide bias); extremes of codon bias; and anomalies of amino acid usage. For example, the cagA domain of Helicobacter pylori is highly deviant in its genome signature and codon bias from the rest of the genome. Using these methods we can detect two potential PAIs in the Neisseria meningitidis genome, which contain hemagglutinin and/or hemolysin-related genes. Additionally, G+C variation and genome signature differences of the Mycobacterium tuberculosis genome indicate two pA gene clusters.  相似文献   

5.
作为DNA序列的重要组成特征,基因组寡核苷酸使用模式及其偏倚的研究已被广泛应用于原核生物基因组的分析。然而,关于寡核苷酸使用模式的偏倚是否具有种群特异性并反映种群的功能这一问题,尚未阐明。我们基于一阶马尔可夫链模型,提出了一个度量寡核苷酸使用模式偏倚的新指标——基因组三核苷酸(trinucleotide,tri-)转移概率偏倚(transition probability bias,TPB)特征向量,或称之为三核苷酸转移概率最大偏倚分布,并分析比较了727条有代表性的原核生物基因组序列tri-TPB特征向量。结果表明,基因组tri-TPB特征向量具有物种特异性,亲缘关系越近的物种,它们的tri-TPB特征向量越相似;同种内的不同菌株具有几乎完全相同的tri-TPB特征向量,并且不依赖于基因组的GC含量;此外,基因组tri-TPB特征向量的相似性与菌株的致病性特征相关。本研究结果为基于全基因组寡核苷酸组成和分布信息的物种及其致病性进化分析提供了新的思路和方法。  相似文献   

6.
The nucleotide composition of genomes undergoes dramatic variations among all three kingdoms of life. GC content, an important characteristic for a genome, is related to many important functions, and therefore GC content and its distribution are routinely reported for sequenced genomes. Traditionally, GC content distribution is assessed by computing GC contents in windows that slide along the genome. Disadvantages of this routinely used window-based method include low resolution and low sensitivity. Additionally, different window sizes result in different GC content distribution patterns within the same genome. We proposed a windowless method, the GC profile, for displaying GC content variations across the genome. Compared to the window-based method, the GC profile has the following advantages: 1) higher sensitivity, because of variation-amplifying procedures; 2) higher resolution, because boundaries between domains can be determined at one single base pair; 3) uniqueness, because the GC profile is unique for a given genome and 4) the capacity to show both global and regional GC content distributions. These characteristics are useful in identifying horizontally-transferred genomic islands and homogenous GC-content domains. Here, we review the applications of the GC profile in identifying genomic islands and genome segmentation points, and in serving as a platform to integrate with other algorithms for genome analysis. A web server generating GC profiles and implementing relevant genome segmentation algorithms is available at: www.zcurve.net.  相似文献   

7.
R Liang  H Liu  F Tao  Y Liu  C Ma  X Liu  J Liu 《Journal of bacteriology》2012,194(17):4781-4782
Pseudomonas putida strain SJTE-1 can utilize 17β-estradiol and other environmental estrogens/toxicants, such as estrone, and naphthalene as sole carbon sources. We report the draft genome sequence of strain SJTE-1 (5,551,505 bp, with a GC content of 62.25%) and major findings from its annotation, which could provide insights into its biodegradation mechanisms.  相似文献   

8.
Different statistical measures of bias of oligonucleotide sequences in DNA sequences were compared, both by theoretical analysis and according to their abilities to predict the relative abundances of oligonucleotides in the genome of Escherichia coli. The expected frequency of an oligonucleotide calculated from a maximal order Markov model was shown to be a degenerate case of the expected frequency calculated from biases of all subwords arising when noncontiguous subwords exhibit no bias. Since (at least in E. coli) noncontiguous sequences exhibit significant bias, the total compositional bias approach is expected to represent biases in genomic sequences more faithfully than Markov approaches. In fact, the efficacy of statistics based on Markov analysis even at the highest order were inferior in predicting actual frequencies of oligonucleotides to methods that factored out biases of internal subwords with gaps. Using total compositional bias as a measure of relative abundance, tetranucleotide and hexanucleotide palindromes were found to be distributed differently from nonpalindromic sequences, with their means shifted somewhat towards underrepresentation. A subpopulation of palindromic hexanucleotides, however, was highly underrepresented, and this group consisted almost entirely of targets for Type II restriction enzymes found within strains of E. coli. Sites recognized by Type I endonucleases from related strains were not markedly biased, and with pentanucleotides, palindromic and nonpalindromic sequences had nearly identical distributions. The loss of restriction sites may be explained by the free transfer of plasmids encoding restriction enzymes and episodic selection for the presence of the enzymes.  相似文献   

9.
Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.  相似文献   

10.
Microsatellite polymorphisms are invaluable for mapping vertebrate genomes. In order to estimate the occurrence of microsatellites in the rabbit genome and to assess their feasibility as markers in rabbit genetics, a survey on the presence of all types of mononucleotide, dinucleotide, trinucleotide and tetranucleotide repeats, with a length of about 20 bp or more, was conducted by searching the published rabbit DNA sequences in the EMBL nucleotide database (version 32). A total of 181 rabbit microsatellites could be extracted from the present database. The estimated frequency of microsatellites in the rabbit genome was one microsatellite for every 2–3 kb of DNA. Dinucleotide repeats constituted the prevailing class of microsatellites, followed by trinucleotide, mononucleotide and tetranucleotide repeats, respectively. The average length of the microsatellites, as found in the database, was 26, 23, 23 and 22 bp for mono-, di-, tri- and tetranucleotide repeats, respectively. The most common repeat motif was AG, followed by A, AC, AGG and CCG. This group comprised about 70% of all extracted rabbit microsatellites. About 61% of the microsatellites were found in non-coding regions of genes, whereas 15% resided in (protein) coding regions. A significant fraction of rabbit microsatellites (about 22%) was found within interspersed repetitive DNA sequences.  相似文献   

11.
《Genomics》2020,112(3):2349-2360
Aroideae is the largest and most diverse subfamily of the plant family Araceae. Despite its agricultural and horticultural importance, the genomic resources are sparse for this subfamily. Here, we report de novo assembled and fully annotated chloroplast genomes of 13 Aroideae species. The quadripartite chloroplast genomes (size range of 158,177–170,037 bp) are comprised of a large single copy (LSC; 75,594–94,702 bp), a small single copy (SSC; 12,903–23,981 bp) and a pair of inverted repeats (IRs; 25,266–34,840 bp). Notable gene rearrangements and IRs contraction / expansions were found for Anchomanes hookeri and Zantedeschia aethiopica. Codon usage, amino acid frequencies, oligonucleotide repeats, GC contents, and gene features revealed similarities among the 13 species. The number of oligonucleotide repeats was uncorrelated with genome size or phylogenetic position of the species. Phylogenetic analyses corroborated the monophyly of Aroideae but were unable to resolve the positions of Calla and Schismatoglottis.  相似文献   

12.
Yeast mitochondrial DNA molecules have long, AT-rich intergenic spacers punctuated by short GC clusters. GC-rich elements have previously been characterized by others as preferred sites for intramolecular recombination leading to the formation of subgenomic petite molecules. In the present study we show that GC clusters are favored sites for intermolecular recombination between a petite and the wild-type grande genome. The petite studied retains 6.5 kb of mitochondrial DNA reiterated tandemly to form molecules consisting of repeated units. Genetic selection for integration of tandem 6.5 kb repeats of the petite into the grande genome yielded a novel recombination event. One of two crossovers in a double exchange event occurred as expected in the 6.5 kb of matching sequence between the genomes, whereas the second exchange involved a 44 bp GC cluster in the petite and another 44 bp GC cluster in the grande genome 700 bp proximal to the region of homology. Creation of a mitochondrial DNA molecule with a repetitive region led to secondary recombination events that generated a family of molecules with zero to several petite units. The finding that 44 bp GC clusters are preferred as sites for intermolecular exchange adds to the data on petite excision implicating these elements as recombinational hotspots in the yeast mitochondrial genome.  相似文献   

13.
The genome of Pseudomonas putida KT2440 encodes an unexpected capacity to tolerate heavy metals and metalloids. The availability of the complete chromosomal sequence allowed the categorization of 61 open reading frames likely to be involved in metal tolerance or homeostasis, plus seven more possibly involved in metal resistance mechanisms. Some systems appeared to be duplicated. These might perform redundant functions or be involved in tolerance to different metals. In total, P. putida was found to bear two systems for arsenic (arsRBCH), one for chromate (chrA), four to six systems for divalent cations (two cadA and two to four czc chemiosmotic antiporters), two systems for monovalent cations: pacS, cusCBA (plus one cryptic silP gene containing a frameshift mutation), two operons for Cu chelation (copAB), one metallothionein for metal(loid) binding, one system for Te/Se methylation (tpmT) and four ABC transporters for the uptake of essential Zn, Mn, Mo and Ni (one nikABCDE, two znuACB and one mobABC). Some of the metal-related clusters are located in gene islands with atypical genome signatures. The predicted capacity of P. putida to endure exposure to heavy metals is discussed from an evolutionary perspective.  相似文献   

14.
Complete archaeal genomes were probed for the presence of long (> or = 25 bp) oligonucleotide repeats (words). We detected the presence of many words distributed in tandem with narrow ranges of periodicity (i.e., spacer length between repeats). Similar words were not identified in genomes of non-archaeal species, namely Escherichia coli, Bacillus subtilis, Haemophilus influenzae, Mycoplasma genitalium and Mycoplasma pneumoniae. BLAST similarity searches against the GenBank nucleotide sequence database revealed that these words were archaeal species-specific, indicating that they are of a signature character. Sequence analysis and genome viewing tools showed these repeats to be restricted to non-coding regions. Thus, archaea appear to possess a non-coding genomic signature that is absent in bacterial species. The identification of a species-specific genomic signature would be of great value to archaeal genome mapping, evolutionary studies and analyses of genome complexity.  相似文献   

15.
We mapped and analyzed the microsatellites throughout 284295605 base pairs of the unambiguously assembled sequence scaffolds along 19 chromosomes of the haploid poplar genome. Totally, we found 150985 SSRs with repeat unit lengths between 2 and 5 bp. The established microsatellite physical map demonstrated that SSRs were distributed relatively evenly across the genome of Populus. On average, These SSRs occurred every 1883 bp within the poplar genome and the SSR densities in intergenic regions, introns, exons and UTRs were 85.4%, 10.7%, 2.7% and 1.2%, respectively. We took di-, tri-, tetra-and pentamers as the four classes of repeat units and found that the density of each class of SSRs decreased with the repeat unit lengths except for the tetranucleotide repeats. It was noteworthy that the length diversification of microsatellite sequences was negatively correlated with their repeat unit length and the SSRs with shorter repeat units gained repeats faster than the SSRs with longer repeat units. We also found that the GC content of poplar sequence significantly correlated with densities of SSRs with uneven repeat unit lengths (tri-and penta-), but had no significant correlation with densities of SSRs with even repeat unit lengths (di-and tetra-). In poplar genome, there were evidences that the occurrence of different microsatellites was under selection and the GC content in SSR sequences was found to significantly relate to the functional importance of microsatellites.  相似文献   

16.
Environmental Sciences Division, Oak Ridge National Laboratory, TN, USA We mapped and analyzed the microsatellites throughout 284295605 base pairs of the unambiguously assembled sequence scaffolds along 19 chromosomes of the haploid poplar genome. Totally, we found 150985 SSRs with repeat unit lengths between 2 and 5 bp. The established microsatellite physical map demonstrated tr at SSRs were distributed relatively evenly across the genome of Populus. On average, These SSRs occurred every 1883 bp within the poplar genome and the SSR densities in intergenic regions, introns, exons and UTRs were 85.4%, 10.7%, 2.7% and 1.2%, respectively. We took di-, tri-, tetra-and pentamers as the four classes of repeat units and found that the density of each class of SSRs decreased with the repeat unit lengths except for the tetranucleotide repeats. It was noteworthy that the length diversification of microsatellite sequences was negatively correlated with their repeat unit length and the SSRs with shorter repeat units gained repeats faster than the SSRs with longer repeat units. We also found that the GC content of poplar sequence significantly correlated with densities of SSRs with uneven repeat unit lengths (tri-and penta-), but had no significant correlation with densities of SSRs with even repeat unit lengths (di-and tetra-). In poplar genome, there were evidences that the occurrence of different microsatellites was under selection and the GC content in SSR sequences was found to significantly relate to the functional importance of microsatellites.  相似文献   

17.
根据实验观察到的DNA成环和弯折机制,以140bp为分界点,探讨高频转录基因上游区与内含子之间可能存在的短程和长程转录协同增效作用(synergy)。用与随机序列做对比的方法,抽提出最近距离在140bp以下的寡核苷酸对,以及最近距离在140bp以上的寡核苷酸对。仔细分析两种距离下的可能的协同寡核苷酸对的位置特征和碱基组分,发现短程协同作用的寡核苷酸对的平均最近距离都在110bp以下,位于上游区的CCAA是一个很明显的特征;而长程协同作用的寡核苷酸对的平均最近距离集中在250-400bp,并且在多数寡核苷酸对中,位于上游区的寡核苷酸是GC丰富的正调控元件。  相似文献   

18.
Gao F  Zhang CT 《The FEBS journal》2006,273(8):1637-1648
The availability of the complete chicken genome sequence provides an unprecedented opportunity to study the global genome organization at the sequence level. Delineating compositionally homogeneous G + C domains in DNA sequences can provide much insight into the understanding of the organization and biological functions of the chicken genome. A new segmentation algorithm, which is simple and fast, has been proposed to partition a given genome or DNA sequence into compositionally distinct domains. By applying the new segmentation algorithm to the draft chicken genome sequence, the mosaic organization of the chicken genome can be confirmed at the sequence level. It is shown herein that the chicken genome is also characterized by a mosaic structure of isochores, long DNA segments that are fairly homogeneous in the G + C content. Consequently, 25 isochores longer than 2 Mb (megabases) have been identified in the chicken genome. These isochores have a fairly homogeneous G + C content and often correspond to meaningful biological units. With the aid of the technique of cumulative GC profile, we proposed an intuitive picture to display the distribution of segmentation points. The relationships between G + C content and the distributions of genes (CpG islands, and other genomic elements) were analyzed in a perceivable manner. The cumulative GC profile, equipped with the new segmentation algorithm, would be an appropriate starting point for analyzing the isochore structures of higher eukaryotic genomes.  相似文献   

19.
Recent studies have shown the non-random distribution of microsatellite motifs between genomic regions within a particular species. This study investigates such microsatellite distributions in the genome of the economically important abalone Haliotis midae, via a bioinformatic survey. In particular, the association of specific repeat motifs to coding regions and transposable elements is investigated. An understanding of microsatellite genomic distribution will facilitate more efficient use and development of this popular molecular marker. A bias toward di- and tetranucleotide repeats was found in the H. midae genome. CA microsatellite units were the most abundant repeat motif, but were notably underrepresented in genic regions where GAGT repeats predominate. Approximately 17.5% and 21% of the microsatellites showed gene and/or transposable element associations, respectively. This could explain the high genomic frequencies of particular motifs across the genome and may allude to a possible functional role. The data presented in this study are the first to demonstrate such non-random dispersal of microsatellites in abalone and support previous findings arguing in favor of non-random distribution of repeat motifs.  相似文献   

20.
The known genomic islands of Pseudomonas aeruginosa clone C strains are integrated into tRNA(Lys) (pKLC102) or tRNA(Gly) (PAGI-2 and PAGI-3) genes and differ from their core genomes by distinctive tetranucleotide usage patterns. pKLC102 and the related island PAPI-1 from P. aeruginosa PA14 were spontaneously mobilized from their host chromosomes at frequencies of 10% and 0.3%, making pKLC102 the most mobile genomic island known with a copy number of 30 episomal circular pKLC102 molecules per cell. The incidence of islands of the pKLC102/PAGI-2 type was investigated in 71 unrelated P. aeruginosa strains from diverse habitats and geographic origins. pKLC102- and PAGI-2-like islands were identified in 50 and 31 strains, respectively, and 15 and 10 subtypes were differentiated by hybridization on pKLC102 and PAGI-2 macroarrays. The diversity of PAGI-2-type islands was mainly caused by one large block of strain-specific genes, whereas the diversity of pKLC102-type islands was primarily generated by subtype-specific combination of gene cassettes. Chromosomal loss of PAGI-2 could be documented in sequential P. aeruginosa isolates from individuals with cystic fibrosis. PAGI-2 was present in most tested Cupriavidus metallidurans and Cupriavidus campinensis isolates from polluted environments, demonstrating the spread of PAGI-2 across habitats and species barriers. The pKLC102/PAGI-2 family is prevalent in numerous beta- and gammaproteobacteria and is characterized by high asymmetry of the cDNA strands. This evolutionarily ancient family of genomic islands retained its oligonucleotide signature during horizontal spread within and among taxa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号