首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gao F  Zhang CT 《The FEBS journal》2006,273(8):1637-1648
The availability of the complete chicken genome sequence provides an unprecedented opportunity to study the global genome organization at the sequence level. Delineating compositionally homogeneous G + C domains in DNA sequences can provide much insight into the understanding of the organization and biological functions of the chicken genome. A new segmentation algorithm, which is simple and fast, has been proposed to partition a given genome or DNA sequence into compositionally distinct domains. By applying the new segmentation algorithm to the draft chicken genome sequence, the mosaic organization of the chicken genome can be confirmed at the sequence level. It is shown herein that the chicken genome is also characterized by a mosaic structure of isochores, long DNA segments that are fairly homogeneous in the G + C content. Consequently, 25 isochores longer than 2 Mb (megabases) have been identified in the chicken genome. These isochores have a fairly homogeneous G + C content and often correspond to meaningful biological units. With the aid of the technique of cumulative GC profile, we proposed an intuitive picture to display the distribution of segmentation points. The relationships between G + C content and the distributions of genes (CpG islands, and other genomic elements) were analyzed in a perceivable manner. The cumulative GC profile, equipped with the new segmentation algorithm, would be an appropriate starting point for analyzing the isochore structures of higher eukaryotic genomes.  相似文献   

2.
MOTIVATION: Some genomic islands contain horizontally transferred genes, which play critical roles in altering the genotypes and phenotypes of organisms, and horizontal gene transfer has been recognized as a universal event throughout bacterial evolution. A windowless method to display the distribution of genomic GC content, the cumulative GC profile, is proposed to identify genomic islands in genomes whose complete genome sequences are available. Two new indices are proposed to assess the codon usage bias and amino acid usage bias in genomic islands. RESULTS: A 211 kb genomic island (CGGI-1) has been identified in the genome of Corynebacterium glutamicum, and three genomic islands VVGI-1, VVGI-2 and VVGI-3, with lengths 167, 40 and 33 kb, respectively, have been identified in the genome of Vibrio vulnificus CMCP6 chromosome I. The CGGI-1 is flanked by two approximately 500 bp direct repeats, and utilizes a Val-tRNA as the integration site. For the VVGI-1 and VVGI-2, each has an integrase gene at 5' junction. All the identified genomic islands show unusual GC content, codon usage and amino acid usage, compared with the rest of the genomes. In addition, it is found that genomic islands are fairly homogenous in terms of GC content variation. An index, h, to quantify the homogeneity of GC content for genomic islands is proposed, and it is shown that h is less than 0.1 for all the genomic islands analyzed. The cumulative GC profile, as well as various indices to assess the codon usage bias, amino acid usage bias and homogeneity of the genomic islands, will be useful in the analysis of other genomes. AVAILABILITY: Programs used in this work and numerical results are available upon request.  相似文献   

3.
We adopted the method of Zhang and Zhang (the Z-Island method) to identify genomic islands in seven human pathogens, analyzing their chromosomal DNA sequences. The Z-Island method is a theoretical method for predicting genomic islands in bacterial genomes; it consists of determination of the cumulative GC profile and computation of codon usage bias. Thirty-one genomic islands were found in seven pathogens using this method. Further analysis demonstrated that most have the known conserved features; this increases the probability that they are real genomic islands. Eleven genomic islands were found to code for products involved in causing disease (virulence factors) or in resistance to antibiotics (resistance factors). This finding could be useful for research on the pathogenicity of these bacteria and helpful in the treatment of the diseases that they cause. In a comparison of the distribution of mobility elements in genomic islands predicted by different methods, the Z-Island method gave lower false-positive rates. The Z-Island method was found to detect more known genomic islands than the two methods that we compared it with, SIGI-HMM and IslandPick. Furthermore, it maintained a better balance between specificity and sensitivity. The only inconvenience is that the steps for finding genomic islands by the Z-Island method are semi-automatic.  相似文献   

4.
Zhang SH  Wang L 《Genomics》2011,97(5):330-331
It has been reported that there is a majority triplet profile among genomes, which was considered as a reflection of general mechanisms of genome evolution (Albrecht-Buehler, 2007). However, there are actually, according to our further analysis and at least among prokaryotic genomes, two common triplet profiles: one is from low-GC content genomes; the other is from high-GC content genomes. Both common profiles would be direct reflections of GC content variations and strand symmetry of genomic sequences.  相似文献   

5.
Incorporated with the Z curve method, the technique of wavelet multiresolution (also known as multiscale) analysis has been proposed to identify the boundaries of isochores in the human genome. The human MHC sequence and the longest contigs of human chromosomes 21 and 22 are used as examples. The boundary between the isochores of Class III and Class II in the MHC sequence has been detected and found to be situated at the position 2,490,368bp. This result is in good agreement with the experimental evidence. An isochore with a length of about 7Mb in chromosome 21 has been identified and found to be gene- and Alu-poor. We have also found that the G+C content of chromosome 21 is more homogeneous than that of chromosome 22. Compared with the window-based methods, the present method has the highest resolution for identifying the boundaries of isochores, even at a scale of single base. Compared with the entropic segmentation method, the present method has the merits of more intuitiveness and less calculations. The important conclusion drawn in this study is that the segmentation points, at which the G+C content undergoes relatively dramatic changes, do exist in the human genome. These 'singularity' points may be considered to be candidates of isochore boundaries in the human genome. The method presented is a general one and can be used to analyze any other genomes.  相似文献   

6.
Eukaryotic genomes display segmental patterns of variation in various properties, including GC content and degree of evolutionary conservation. DNA segmentation algorithms are aimed at identifying statistically significant boundaries between such segments. Such algorithms may provide a means of discovering new classes of functional elements in eukaryotic genomes. This paper presents a model and an algorithm for Bayesian DNA segmentation and considers the feasibility of using it to segment whole eukaryotic genomes. The algorithm is tested on a range of simulated and real DNA sequences, and the following conclusions are drawn. Firstly, the algorithm correctly identifies non-segmented sequence, and can thus be used to reject the null hypothesis of uniformity in the property of interest. Secondly, estimates of the number and locations of change-points produced by the algorithm are robust to variations in algorithm parameters and initial starting conditions and correspond to real features in the data. Thirdly, the algorithm is successfully used to segment human chromosome 1 according to GC content, thus demonstrating the feasibility of Bayesian segmentation of eukaryotic genomes. The software described in this paper is available from the author's website (www.uq.edu.au/ approximately uqjkeith/) or upon request to the author.  相似文献   

7.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

8.
Comparative genomics has revealed that variations in bacterial and archaeal genome DNA sequences cannot be explained by only neutral mutations. Virus resistance and plasmid distribution systems have resulted in changes in bacterial and archaeal genome sequences during evolution. The restriction-modification system, a virus resistance system, leads to avoidance of palindromic DNA sequences in genomes. Clustered, regularly interspaced, short palindromic repeats (CRISPRs) found in genomes represent yet another virus resistance system. Comparative genomics has shown that bacteria and archaea have failed to gain any DNA with GC content higher than the GC content of their chromosomes. Thus, horizontally transferred DNA regions have lower GC content than the host chromosomal DNA does. Some nucleoid-associated proteins bind DNA regions with low GC content and inhibit the expression of genes contained in those regions. This form of gene repression is another type of virus resistance system. On the other hand, bacteria and archaea have used plasmids to gain additional genes. Virus resistance systems influence plasmid distribution. Interestingly, the restriction-modification system and nucleoid-associated protein genes have been distributed via plasmids. Thus, GC content and genomic signatures do not reflect bacterial and archaeal evolutionary relationships.  相似文献   

9.
Isolation of CpG islands from large genomic clones   总被引:4,自引:0,他引:4  
  相似文献   

10.
It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.  相似文献   

11.
Zavala A  Naya H  Romero H  Sabbia V  Piovani R  Musto H 《Gene》2005,357(2):137-143
GC level is a key feature in prokaryotic genomes. Widely employed in evolutionary studies, new insights appear however limited because of the relatively low number of characterized genomes. Since public databases mainly comprise several hundreds of prokaryotes with a low number of sequences per genome, a reliable prediction method based on available sequences may be useful for studies that need a trustworthy estimation of whole genomic GC. As the analysis of completely sequenced genomes shows a great variability in distributional shapes, it is of interest to compare different estimators. Our analysis shows that the mean of GC values of a random sample of genes is a reasonable estimator, based on simplicity of the calculation and overall performance. However, usually sequences come from a process that cannot be considered as random sampling. When we analyzed two introduced sources of bias (gene length and protein functional categories) we were able to detect an additional bias in the estimation for some cases, although the precision was not affected. We conclude that the mean genic GC level of a sample of 10 genes is a reliable estimator of genomic GC content, showing comparable accuracy with many widely employed experimental methods.  相似文献   

12.
It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025).  相似文献   

13.
Comparison of the human and mouse genomes has revealed that significant variations in evolutionary rates exist among genomic regions and that a large part of this variation is interchromosomal. We confirm in this work, using a large collection of introns, that human chromosome 19 is the one that shows the highest divergence with respect to mouse. To search for other differences among chromosomes, we examine the distribution of gene functions in human and mouse chromosomes using the Gene Ontology definitions. We found by correspondence analysis that among the strongest clusterings of gene functions in human chromosomes is a group of genes coding for DNA binding proteins in chromosome 19. Interestingly, chromosome 19 also has a very high GC content, a feature that has been proposed to promote an opening of the chromatin, thereby facilitating binding of proteins to the DNA helix. In the mouse genome, however, a similar aggregation of genes coding for DNA binding proteins and high GC content cannot be found. This suggests that the distribution of genes coding for DNA binding proteins and the variations of the chromatin accessibility to these proteins are different in the human and mouse genomes. It is likely that the overall high synonymous and intron rates in chromosome 19 are a by-product of the high GC content of this chromosome.Department of Physiology and Molecular Biodiversity, Institut de Biologia Molecular de Barcelona, CSIC, Jordi Girona 18, 08034 Barcelona, Spain  相似文献   

14.
The species in family Planctomycetaceae are ideal groups for investigating the origin of eukaryotes. Their cells are divided by a lipidic intracytoplasmic membrane and they share a number of eukaryote-like molecular characteristics. However, their genomic structures, potential abilities, and evolutionary status are still unknown. In this study, we searched for common protein families and a core genome/pan genome based on 11 sequenced species in family Planctomycetaceae. Then, we constructed phylogenetic tree based on their 832 common protein families. We also annotated the 11 genomes using the Clusters of Orthologous Groups database. Moreover, we predicted and reconstructed their core/pan metabolic pathways using the KEGG (Kyoto Encyclopedia of Genes and Genomes) orthology system. Subsequently, we identified genomic islands (GIs) and structural variations (SVs) among the five complete genomes and we specifically investigated the integration of two Planctomycetaceae plasmids in all 11 genomes. The results indicate that Planctomycetaceae species share diverse genomic variations and unique genomic characteristics, as well as have huge potential for human applications.  相似文献   

15.
Microbial genes that are “novel” (no detectable homologs in other species) have become of increasing interest as environmental sampling suggests that there are many more such novel genes in yet-to-be-cultured microorganisms. By analyzing known microbial genomic islands and prophages, we developed criteria for systematic identification of putative genomic islands (clusters of genes of probable horizontal origin in a prokaryotic genome) in 63 prokaryotic genomes, and then characterized the distribution of novel genes and other features. All but a few of the genomes examined contained significantly higher proportions of novel genes in their predicted genomic islands compared with the rest of their genome (Paired t test = 4.43E-14 to 1.27E-18, depending on method). Moreover, the reverse observation (i.e., higher proportions of novel genes outside of islands) never reached statistical significance in any organism examined. We show that this higher proportion of novel genes in predicted genomic islands is not due to less accurate gene prediction in genomic island regions, but likely reflects a genuine increase in novel genes in these regions for both bacteria and archaea. This represents the first comprehensive analysis of novel genes in prokaryotic genomic islands and provides clues regarding the origin of novel genes. Our collective results imply that there are different gene pools associated with recently horizontally transmitted genomic regions versus regions that are primarily vertically inherited. Moreover, there are more novel genes within the gene pool associated with genomic islands. Since genomic islands are frequently associated with a particular microbial adaptation, such as antibiotic resistance, pathogen virulence, or metal resistance, this suggests that microbes may have access to a larger “arsenal” of novel genes for adaptation than previously thought.  相似文献   

16.
Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.  相似文献   

17.

Background

An increasing number of microbial genomes are being sequenced and deposited in public databases. In addition, several closely related strains are also being sequenced in order to understand the genetic basis of diversity and mechanisms that lead to the acquisition of new genetic traits. These exercises have necessitated the requirement for visualizing microbial genomes and performing genome comparisons on a finer scale. We have developed GenomeViz to enable rapid visualization and subsequent comparisons of several microbial genomes in an interactive environment.

Results

Here we describe a program that allows visualization of both qualitative and quantitative information from complete and partially sequenced microbial genomes. Using GenomeViz, data deriving from studies on genomic islands, gene/protein classifications, GC content, GC skew, whole genome alignments, microarrays and proteomics may be plotted. Several genomes can be visualized interactively at the same time from a comparative genomic perspective and publication quality circular genome plots can be created.

Conclusions

GenomeViz should allow researchers to perform visualization and comparative analysis of up to eight different microbial genomes simultaneously.
  相似文献   

18.
Lightfield J  Fram NR  Ely B 《PloS one》2011,6(3):e17677
The GC content of bacterial genomes ranges from 16% to 75% and wide ranges of genomic GC content are observed within many bacterial phyla, including both gram negative and gram positive phyla. Thus, divergent genomic GC content has evolved repeatedly in widely separated bacterial taxa. Since genomic GC content influences codon usage, we examined codon usage patterns and predicted protein amino acid content as a function of genomic GC content within eight different phyla or classes of bacteria. We found that similar patterns of codon usage and protein amino acid content have evolved independently in all eight groups of bacteria. For example, in each group, use of amino acids encoded by GC-rich codons increased by approximately 1% for each 10% increase in genomic GC content, while the use of amino acids encoded by AT-rich codons decreased by a similar amount. This consistency within every phylum and class studied led us to conclude that GC content appears to be the primary determinant of the codon and amino acid usage patterns observed in bacterial genomes. These results also indicate that selection for translational efficiency of highly expressed genes is constrained by the genomic parameters associated with the GC content of the host genome.  相似文献   

19.
Corynebacterium efficiens is a gram-positive nonpathogenic bacterium which can grow and produce glutamate at 40 degrees C or above. By using the cumulative GC profile method, we have identified four genomic islands which have many unifying genomic island-specific features in the C. efficiens genome. The presence of the gene encoding an aspartate kinase in a genomic island helps explain the unexpected low thermal stability of this enzyme; i.e., the adaptive mutations have not occurred extensively due to the recent horizontal gene transfer.  相似文献   

20.
While the recognition of genomic islands can be a powerful mechanism for identifying genes that distinguish related bacteria, few methods have been developed to identify them specifically. Rather, identification of islands often begins with cataloging individual genes likely to have been recently introduced into the genome; regions with many putative alien genes are then examined for other features suggestive of recent acquisition of a large genomic region. When few phylogenetic relatives are available, the identification of alien genes relies on their atypical features relative to the bulk of the genes in the genome. The weakness of these ‘bottom–up’ approaches lies in the difficulty in identifying robustly those genes which are atypical, or phylogenetically restricted, due to recent foreign ancestry. Herein, we apply an alternative ‘top–down’ approach where bacterial genomes are recursively divided into progressively smaller regions, each with uniform composition. In this way, large chromosomal regions with atypical features are identified with high confidence due to the simultaneous analysis of multiple genes. This approach is based on a generalized divergence measure to quantify the compositional difference between segments in a hypothesis-testing framework. We tested the proposed genome island prediction algorithm on both artificial chimeric genomes and genuine bacterial genomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号