首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Jabbari K  Bernardi G 《Gene》2000,247(1-2):287-292
In the present work we show that in the Drosophila genome (which covers a 37-51% GC range at a DNA size of approx.50kb) a linear correlation holds between GC (or GC(3)50kb) genomic sequences embedding them. This correlation allows us to position the two compositional distributions of (a) coding sequences, and (b) of long DNA segments relative to each other and to calculate gene concentration across the compositional range of the Drosophila genome. Using this approach, we show that gene concentration increases with increasing GC of the regions embedding the genes, reaching a 7-fold higher level in the GC-richest regions compared with the GC-poorest regions. The gene distribution of the Drosophila genome is, therefore, similar to (although less striking than) that of the human genome, whereas it is very different from those of the Arabidopsis genome, which has about the same size as the Drosophila genome.  相似文献   

2.
The compositional properties of human genes   总被引:8,自引:0,他引:8  
Summary The present work represents the first attempt to study in greater detail previously proposed compositional correlations in genomes, based on a body of additional data relating to gene localizations as well as to extended flanking sequences extracted from gene banks. We have investigated the correlations that exist between (1) the GC levels of exons of human genes, and (2) the GC levels of either intergenic sequences or introns associated with the genes under consideration. In both cases, linear relationships with slopes close to unity were found. The similarity of the linear relationships indicates similar GC levels in intergenic sequences and introns located in the same isochores. Moreover, both intergenic sequences and introns showed GC levels 5–10% lower than the corresponding exons. The above findings considerably strengthen the previously drawn conclusion that coding and noncoding sequences (both inter- and intragenic) from the same isochores of the human genome are compositionally correlated. In addition, we find linear correlations between the GC levels of codon positions and of the intergenic sequences or introns associated with the corresponding genes, as well as among the GC levels of codon positions of genes.  相似文献   

3.
For the past one decade, there has been considerable explosion of interest in searching novel regulatory elements in the intergenic region between the protein coding regions. The microbial genomes are the most exploited in terms of intergenic (noncoding) regions due to its less complexity. We think, the increasing pace of genome sequencing calls for a tool which will be useful for the extraction of intergenic regions. IntergenicS (Intergenic Sequence) is a tool which can extract the intergenic regions of microbial genomes at NCBI. All the unannotated regions between annotated protein coding genes and noncoding RNA genes can be extracted. It also deals with the calculation of GC base composition of the intergenic regions. This will be a useful tool for the analysis of noncoding regions of both bacterial and archael genomes.  相似文献   

4.
Summary The compositional distributions of coding sequences and DNA molecules (in the 50-100-kb range) are remarkably narrower in murids (rat and mouse) compared to humans (as well as to all other mammals explored so far). In murids, both distributions begin at higher and end at lower GC values. A comparison of homologous coding sequences from murids and humans revealed that their different compositional distributions are due to differences in GC levels in all three codon positions, particularly of genes located at both ends of the distribution. In turn, these differences are responsible for differences in both codon usage and amino acids. When GC levels at first+second codon positions and third codon positions, respectively, of murid genes are plotted against corresponding GC levels of homologous human genes, linear relationships (with very high correlation coefficients and slopes of about 0.78 and 0.60, respectively) are found. This indicates a conservation of the order of GC levels in homologous genes from humans and murids. (The same comparison for mouse and rat genes indicates a conservation of GC levels of homologous genes.) A similar linear relationship was observed when plotting GC levels of corresponding DNA fractions (as obtained by density gradient centrifugation in the presence of a sequence-specific ligand) from mouse and human. These findings indicate that orderly compositional changes affecting not only coding sequences but also noncoding sequences took place since the divergence of murids. Such directional fixations of mutations point to the existence of selective pressures affecting the genome as a whole.  相似文献   

5.
The isochore organization of the mammalian genome comprises a general pattern and some special patterns, the former being characterized by a wider compositional distribution of the DNA fragments. The large majority of the mammalian genomes belong to the former, and only some groups, such as the Myomorpha sub-order of Rodentia, belong to the latter. Here we describe the compositional organization of the pig (Sus scrofa) genome that belongs to the general mammalian pattern. We investigated (i) the compositional distribution of the genes by analysis of their GC3 levels (the GC levels at the third codon positions), and (ii) the correlation between the GC3 value of orthologous genes from pig and other vertebrates (human, calf, mouse, chicken, and Xenopus). As expected, the highest gene concentration corresponded to the H3 isochore family, and the highest GC3 correlations were observed in the pig/human and pig/calf comparisons. Then we identified, by in situ hybridization of the GC-richest H3 isochores, the pig chromosomal regions endowed by the highest gene-density that largely corresponded to the telomeric chromosomal bands. Moreover, we observed that these gene-rich bands are syntenic with the previously identified GC-richest/gene richest H3+ bands of the human chromosomes. At the cell nucleus level, we observed that the gene-dense region corresponded to the more internal compartment, as previously found in human and avian cell nuclei.  相似文献   

6.
This paper analyses the compositional correlations that hold in the chicken genome. Significant linear correlations were found among the regions studied—coding sequences (and their first, second, and third codon positions), flanking regions (5′ and 3′), and introns—as is the case in the human genome. We found that these compositional correlations are not limited to global GC levels but even extend to individual bases. Furthermore, an analysis of 1037 coding sequences has confirmed a correlation among GC3, GC2, and GC1. The implications of these results are discussed. Received: 9 December 1998 / Accepted: 18 April 1999  相似文献   

7.
The nucleosome formation potential of introns, intergenic spacers and exons of human genes is shown here to negatively correlate with among-tissues breadth of gene expression. The nucleosome formation potential is also found to negatively correlate with the GC content of genomic sequences; the slope of regression line is steeper in exons compared with noncoding DNA (introns and intergenic spacers). The correlation with GC content is independent of sequence length; in turn, the nucleosome formation potential of introns and intergenic spacers positively (albeit weakly) correlates with sequence length independently of GC content. These findings help explain the functional significance of the isochores (regions differing in GC content) in the human genome as a result of optimization of genomic structure for epigenetic complexity and support the notion that noncoding DNA is important for orderly chromatin condensation and chromatin-mediated suppression of tissue-specific genes.  相似文献   

8.
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.  相似文献   

9.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

10.
人类蛋白编码基因局部GC水平相关性分析   总被引:2,自引:0,他引:2  
陈祥贵  胡军  杨潇 《遗传》2008,30(9):1169-1174
GC含量是基因组DNA序列碱基组成的重要特征, 蕴涵基因结构、功能和进化信息。文中通过从公共数据库提取7 992个非冗余的人类蛋白质编码基因DNA序列, 分析了基因序列不同区域的局部GC含量和相关性。结果表明: 基因局部GC含量呈现不均一性, 5′非翻译区GC水平最高, 为62.56%; 而3′非翻译区GC水平最低, 为43.97%。3′侧翼序列的GC含量能较好地代表基因所在区域DNA长片段的GC水平。虽然开放阅读框的GC含量比内含子、3′非翻译区和3′侧翼序列的GC含量高, 但4个区域的GC含量之间均存在较高的相关性。密码子第三位置的平均GC含量(GC3)为58.09%, 显著高于密码子第一位置和第二位置的GC含量, 且与开放阅读框的GC水平高度相关, 相关系数高达0.91。GC3与内含子、3′非翻译区、3′侧翼序列的GC水平相关性也较高, GC3对3′侧翼序列的GC含量的直线回归斜率为1.25。因此, GC3可作为基因所在区域GC水平变化的敏感性指标。而密码子第一位置和第二位置以及5′侧翼序列和5′非翻译区GC水平与基因其他区域的GC水平的相关性较弱。该研究结果提示: 基因蛋白编码区密码子第三位置、内含子、3′非翻译区和3′侧翼序列的碱基可能经历了相近的进化过程, 而蛋白编码区密码子第一位置和第二位置、5′侧翼序列和5′非翻译区由于功能的需要而经历了不同的突变和选择。  相似文献   

11.
Ouyang Q  Zhao X  Feng H  Tian Y  Li D  Li M  Tan Z 《Gene》2012,499(1):37-40
The presence, locations and composition of simple sequence repeats (SSRs) in Herpes simplex virus type 1 (HSV-1) genome were extracted and analyzed by using the software Imperfect Microsatellite Extractor (IMEx). There were 663 mon-, 502 di-, 184 tri-, 20 tetra-, 4 penta- and 4 hexanucleotide SSRs that were observed in different distribution between coding and noncoding regions in the HSV-1 genome. G/C, GC/CG, and (GGC)(n) were predominant in mononucleotide, dinucletide, trinucleotide repeats respectively. Indeed, the results showed that GC content in simple sequence repeats was notably higher than that in entire HSV-1 genome. Our data might be helpful for studying the pathogenesis, genome structure and evolution of HSV-1.  相似文献   

12.
Summary We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unity slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.  相似文献   

13.
Phytophthora is a genus entirely comprised of destructive plant pathogens. It belongs to the Stramenopila, a unique branch of eukaryotes, phylogenetically distinct from plants, animals, or fungi. Phytophthora genes show a strong preference for usage of codons ending with G or C (high GC3). The presence of high GC3 in genes can be utilized to differentiate coding regions from noncoding regions in the genome. We found that both selective pressure and mutation bias drive codon bias in Phytophthora. Indicative for selection pressure is the higher GC3 value of highly expressed genes in different Phytophthora species. Lineage specific GC increase of noncoding regions is reminiscent of whole-genome mutation bias, whereas the elevated Phytophthora GC3 is primarily a result of translation efficiency-driven selection. Heterogeneous retrotransposons exist in Phytophthora genomes and many of them vary in their GC content. Interestingly, the most widespread groups of retroelements in Phytophthora show high GC3 and a codon bias that is similar to host genes. Apparently, selection pressure has been exerted on the retroelement’s codon usage, and such mimicry of host codon bias might be beneficial for the propagation of retrotransposons. Reviewing Editor: Dr. Yves van de Peer  相似文献   

14.
The genomes of homeothermic (warm-blooded) vertebrates are mosaic interspersions of homogeneously GC-rich and GC-poor regions (isochores). Evolution of genome compartmentalization and GC-rich isochores is hypothesized to reflect either selective advantages of an elevated GC content or chromosome location and mutational pressure associated with the timing of DNA replication in germ cells. To address the present controversy regarding the origins and maintenance of isochores in homeothermic vertebrates, newly obtained as well as published nucleotide sequences of the insulin and insulin-like growth factor (IGF) genes, members of a well-characterized gene family believed to have evolved by repeated duplication and divergence, were utilized to examine the evolution of base composition in nonconstrained (flanking) and weakly constrained (introns and fourfold degenerate sites) regions. A phylogeny derived from amino acid sequences supports a common evolutionary history for the insulin/IGF family genes. In cold- blooded vertebrates, insulin and the IGFs were similar in base composition. In contrast, insulin and IGF-II demonstrate dramatic increases in GC richness in mammals, but no such trend occurred in IGF- I. Base composition of the coding portions of the insulin and IGF genes across vertebrates correlated (r = 0.90) with that of the introns and flanking regions. The GC content of homologous introns differed dramatically between insulin/IGF-II and IGF-I genes in mammals but was similar to the GC level of noncoding regions in neighboring genes. Our findings suggest that the base composition of introns and flanking regions is determined by chromosomal location and the mutational pressure of the isochore in which the sequences are embedded. An elevated GC content at codon third positions in the insulin and the IGF genes may reflect selective constraints on the usage of synonymous codons.   相似文献   

15.
Summary Doublet preference analysis was carried out on coding and noncoding regions ofEscherichia coli, Saccharomyces cerevisiae, and human mitochondrial and nuclear DNA. The preference pattern in 1–2 and 2–3 doublets inE. coli andS. cerevisiae correlated with that in noncoding regions. The 3-1 doublet preference inE. coli genes with low optimal codon frequency and inS. cerevisiae genes also showed a correlation with each of their noncoding doublet preference. A mechanism to explain these double preference correlations in doublet preference is presented: mutational biases, the origin of the noncoding region doublet preference, evolved so as to maintain the 1–2 and 2–3 doublet preference, which is determined by codon usage. These biases then acted on the 3-1 doublet, which was almost free of coding constraints, resulting in a similar preference in this doublet.  相似文献   

16.
The aim of this study was to analyze patterns of nucleotidic composition and codon usage in the pea aphid genome (Acyrthosiphon pisum). A collection of 60,000 expressed sequence tags (ESTs) in the pea aphid has been used to automatically reconstruct 5809 coding sequences (CDSs), based on similarity with known proteins and on coding style recognition. Reconstructions were manually checked for ribosomal proteins, leading to tentatively reconstruct the nea-complete set of this category. Pea aphid coding sequences showed a shift toward AT (especially at the third codon position) compared to drosophila homologues. Genes with a putative high level of expression (ribosomal and other genes with high EST support) remained more GC3-rich and had a distinct codon usage from bulk sequences: they exhibited a preference for C-ending codons and CGT (for arginine), which thus appeared optimal for translation. However, the discrimination was not as strong as in drosophila, suggesting a reduced degree of translational selection. The space of variation in codon usage for A. pisum appeared to be larger than in drosophila, with a substantial fraction of genes that remained GC3-rich. Some of those (in particular some structural proteins) also showed high levels of codon bias and a very strong preference for C-ending codons, which could be explained either by strong translational selection or by other mechanisms. Finally, genomic traces were analyzed to build 206 fragments containing a full CDS, which allowed studying the correlations between GC contents of coding and those of noncoding (flanking and introns) sequences.  相似文献   

17.
18.
The contribution of slippage-like processes to genome evolution   总被引:19,自引:0,他引:19  
Simple sequences present in long (>30 kb) sequences representative of the single-copy genome of five species (Homo sapiens, Caenorhabditis elegans Saccharomyces cerevisiae, E. coli, and Mycobacterium leprae) have been analyzed. A close relationship was observed between genome size and the overall level of sequence repetition. This suggested that the incorporation of simple sequences had accompanied increases of genome size during evolution. Densities of simple sequence motifs were higher in noncoding regions than in coding regions in eukaryotes but not in eubacteria. All five genomes showed very biased frequency distributions of simple sequence motifs in all species, particularly in eukaryotes where AAA and TTT predominated. Interspecific comparisons showed that noncoding sequences in eukaryotes showed highly significantly similar frequency distributions of simple sequence motifs but this was not true of coding sequences. ANOVA of the frequency distributions of simple sequence motifs indicated strong contributions from motif base composition and repeat unit length, but much of the variation remained unexplained by these parameters. The sequence composition of simple sequences therefore appears to reflect both underlying sequence biases in slippage-like processes and the action of selection. Frequency distributions of simple sequence motifs in coding sequences correlated weakly or not at all with those in noncoding sequences. Selection on coding sequences to eliminate undesirable sequences may therefore have been strong, particularly in the human lineage.  相似文献   

19.
The genomes of eukaryotes are mosaics of isochores. These are long DNA stretches that are fairly homogeneous in base composition and that belong to a small number of families characterized by different ratios of GC to AT and different short-sequence patterns (i.e., different DNA structures that interact with different proteins). This genome organization led to two discoveries: (1) the genomic code, which refers to two correlations, that of the composition of coding and contiguous noncoding sequences, and that of coding sequences and the structural properties of the encoded proteins; and (2) the genome phenotypes, which correspond to the patterns of isochore families in the genomes. These patterns indicate that genome evolution may proceed either according to a conservative mode or to a transitional (isochore shifting) mode, apparently depending upon whether the environment is constant or shifting. According to the neoselectionist theory, natural selection is responsible for both modes.  相似文献   

20.
Fortes GG  Bouza C  Martínez P  Sánchez L 《Genetica》2007,129(3):281-289
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号