首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
《Gene》1996,174(1):95-102
Linear correlations exist between the GC levels of third codon positions (GC3) of individual human genes and the GC levels of long genomic sequences and DNA molecules (50–100 kb in size) embedding the genes. These linear relationships allow the positioning of the GC3 histogram of cDNA sequences from the databases relative to the CsC1 profile of human DNA. In turn, this allows an estimate of the relative concentrations of genes in genomic regions of different GC content. An estimate obtained by using current sequence data and Gaussian decompositions of the GC3 histogram and of the CsC1 profile indicates that the GC-richest (non-ribosomal) component of the human genome is at least 17 times as gene-rich as the GC-poor regions. Moreover, our results suggest that the most recent physical maps of the human genome consisting of overlapping YACs cover less than 50% of the genes.  相似文献   

2.
Shi X  Wang X  Li Z  Zhu Q  Tang W  Ge S  Luo J 《Gene》2006,376(2):199-206
Understanding the correlation between synonymous substitution rate and GC content is essential to decipher the gene evolution. However, it has been controversial on their relationship. We analyzed the GC content and synonymous substitution rate in 1092 paralogues produced by two large-scale duplication events in the rice genome. According to the GC content at the third codon sites (GC3), the paralogues were classified into GC3-rich and GC3-poor genes. By referring to their outgroup sequences, we inferred the last common ancestor of sister paralogues and, consequently, calculated the average synonymous substitution rate for two gene classes. The results suggest that average synonymous substitution rate is lower in GC3-rich genes than that in GC3-poor genes, indicating that the synonymous substitution rate is negatively correlated with GC content in the rice genome. Through characterizing the synonymous nucleotide substitution pattern, we found a strong synonymous nucleotide substitution frequency bias from AT to GC in GC3-rich genes. This indicates possible limitations of commonly used methods developed to estimate the synonymous substitution rate. Their estimates might produce misleading results on correlation between the synonymous substitution rate and GC content.  相似文献   

3.
Cereal genes are classified into two distinct classes according to the guanine-cytosine (GC) content at the third codon sites (GC3). Natural selection and mutation bias have been proposed to affect the GC content. However, there has been controversy about the cause of GC variation. Here, we characterized the GC content of 1 092 paralogs and other single-copy genes in the duplicated chromosomal regions of the rice genome (ssp. indica) and classified the paralogs into GC3-rich and GC3-poor groups. By referring to out-group sequences from Arabidopsis and maize, we confirmed that the average synonymous substitution rate of the GC3-rich genes is significantly lower than that of the GC3-poor genes. Furthermore, we explored the other possible factors corresponding to the GC variation including the length of coding sequences, the number of exons in each gene, the number of genes in each family, the location of genes on chromosomes and the protein functions. Consequently, we propose that natural selection rather than mutation bias was the primary cause of the GC variation.  相似文献   

4.
Cereal genes are classified into two distinct classes according to the guanine-cytosine(GC)content at the third codonsites(GC_3).Natural selection and mutation bias have been proposed to affect the GC content.However,there has beencontroversy about the cause of GC variation.Here,we characterized the GC content of 1092 paralogs and other single-copygenes in the duplicated chromosomal regions of the rice genome(ssp.indica)and classified the paralogs into GC_3-richand GC_3-poor groups.By referring to out-group sequences from Arabidopsis and maize,we confirmed that the averagesynonymous substitution rate of the GC_3-rich genes is significantly lower than that of the GC_3-poor genes.Furthermore,we explored the other possible factors corresponding to the GC variation including the length of coding sequences,thenumber of exons in each gene,the number of genes in each family,the location of genes on chromosomes and the proteinfunctions.Consequently,we propose that natural selection rather than mutation bias was the primary cause of the GCvariation.  相似文献   

5.
Synonymous codon choices vary considerably among Schistosoma mansoni genes. Principal components analysis detects a single major trend among genes, which highly correlates with GC content in third codon positions and exons, but does not discriminate among putatively highly and lowly expressed genes. The effective number of codons used in each gene, and its distribution when plotted against GC3, suggests that codon usage is shaped mainly by mutational biases. The GC content of exons, GC3, 5′, 3′, and flanking (5′+ 3′+ introns) regions are all correlated among them, suggesting that variations in GC content may exist among different regions of the S. mansoni genome. We propose that this genome structure might be among the most important factors shaping codon usage in this species, although the action of selection on certain sequences cannot be excluded. Received: 10 March 1997 / Accepted: 27 June 1997  相似文献   

6.
The aim of this study was to analyze patterns of nucleotidic composition and codon usage in the pea aphid genome (Acyrthosiphon pisum). A collection of 60,000 expressed sequence tags (ESTs) in the pea aphid has been used to automatically reconstruct 5809 coding sequences (CDSs), based on similarity with known proteins and on coding style recognition. Reconstructions were manually checked for ribosomal proteins, leading to tentatively reconstruct the nea-complete set of this category. Pea aphid coding sequences showed a shift toward AT (especially at the third codon position) compared to drosophila homologues. Genes with a putative high level of expression (ribosomal and other genes with high EST support) remained more GC3-rich and had a distinct codon usage from bulk sequences: they exhibited a preference for C-ending codons and CGT (for arginine), which thus appeared optimal for translation. However, the discrimination was not as strong as in drosophila, suggesting a reduced degree of translational selection. The space of variation in codon usage for A. pisum appeared to be larger than in drosophila, with a substantial fraction of genes that remained GC3-rich. Some of those (in particular some structural proteins) also showed high levels of codon bias and a very strong preference for C-ending codons, which could be explained either by strong translational selection or by other mechanisms. Finally, genomic traces were analyzed to build 206 fragments containing a full CDS, which allowed studying the correlations between GC contents of coding and those of noncoding (flanking and introns) sequences.  相似文献   

7.
8.
This paper analyses the compositional correlations that hold in the chicken genome. Significant linear correlations were found among the regions studied—coding sequences (and their first, second, and third codon positions), flanking regions (5′ and 3′), and introns—as is the case in the human genome. We found that these compositional correlations are not limited to global GC levels but even extend to individual bases. Furthermore, an analysis of 1037 coding sequences has confirmed a correlation among GC3, GC2, and GC1. The implications of these results are discussed. Received: 9 December 1998 / Accepted: 18 April 1999  相似文献   

9.
Synonymous codon usage of 53 protein coding genes in chloroplast genome of Coffea arabica was analyzed for the first time to find out the possible factors contributing codon bias. All preferred synonymous codons were found to use A/T ending codons as chloroplast genomes are rich in AT. No difference in preference for preferred codons was observed in any of the two strands, viz., leading and lagging strands. Complex correlations between total base compositions (A, T, G, C, GC) and silent base contents (A3, T3, G3, C3, GC3) revealed that compositional constraints played crucial role in shaping the codon usage pattern of C. arabica chloroplast genome. ENC Vs GC3 plot grouped majority of the analyzed genes on or just below the left side of the expected GC3 curve indicating the influence of base compositional constraints in regulating codon usage. But some of the genes lie distantly below the continuous curve confirmed the influence of some other factors on the codon usage across those genes. Influence of compositional constraints was further confirmed by correspondence analysis as axis 1 and 3 had significant correlations with silent base contents. Correlation of ENC with axis 1, 4 and CAI with 1, 2 prognosticated the minor influence of selection in nature but exact separation of highly and lowly expressed genes could not be seen. From the present study, we concluded that mutational pressure combined with weak selection influenced the pattern of synonymous codon usage across the genes in the chloroplast genomes of C. arabica.  相似文献   

10.
The relationship between the synonymous codon usage and different protein secondary structural classes were investigated using 401 Homo sapiens proteins extracted from Protein Data Bank (PDB). A simple Chi-square test was used to assess the significance of deviation of the observed and expected frequencies of 59 codons at the level of individual synonymous families in the four different protein secondary structural classes. It was observed that synonymous codon families show non-randomness in codon usage in four different secondary structural classes. However, when the genes were classified according to their GC3 levels there was an increase in non-randomness in high GC3 group of genes. The non-randomness in codon usage was further tested among the same protein secondary structures belonging to four different protein folding classes of high GC3 group of genes. The results show that in each of the protein secondary structural unit there exist some synonymous family that shows class specific codon-usage pattern. Moreover, there is an increased non-random behaviour of synonymous codons in sheet structure of all secondary structural classes in high GC3 group of genes. Biological implications of these results have been discussed.  相似文献   

11.
12.
Employing a set of 43 othologous mouse and rat genes, Hughes and Yeager (J. Mol. Evol. 45:125–130, 1997) reported (1) no correlation between synonymous and nonsynonymous rates of nucleotide substitution, (2) a positive correlation between intronic GC contents (GC i) and intronic substitution rates (K i), (3) that the average K i value was very similar to the average K s value, and (4) that the compositional correlation between the rat and the mouse genes is stronger at the third codon position (GC3) than at the first and second codon positions (GC12). We have examined the robustness of these results to alterations in substitution rate estimation protocol, alignment protocol, and statistical procedure. We find that a significant correlation between K a and K s is observed either if a rank correlation statistic is used instead of regression analysis, if one outlier is excluded from the analysis, or if a regression weighted by gene size is employed. The correlation between K i and GC i we find to be sensitive to changes in alignment protocol and disappears on the use of weighted means. The finding that K s and K i are approximately the same is dependent on the method for estimating K s values. Finally, the variance around the regression line of rat GC3 versus mouse GC3 we find to be significantly higher than that in GC12. The source of the discrepancy between this and Hughes and Yeager's result is unclear. The variance around the line for GC4 is higher still, as might be expected. Using a methodology that may be considered preferable to that of Hughes and Yeager, we find that all four of their results are contradicted. More importantly this analysis reinforces the need for caution in assembling and analyzing data sets, as the degree of sensitivity to what many might consider minor methodological alterations is unexpected. Received: 2 February 1998 / Accepted: 23 March 1998  相似文献   

13.
14.
In this work, we have investigated the relationships between synonymous and nonsynonymous rates and base composition in coding sequences from Gramineae to analyze the factors underlying the variation in substitutional rates. We have shown that in these genes the rates of nucleotide divergence, both synonymous and nonsynonymous, are, to some extent, dependent on each other and on the base composition. In the first place, the variation in nonsynonymous rate is related to the GC level at the second codon position (the higher the GC2 level, the higher the amino acid replacement rate). The correlation is especially strong with T2, the coefficients being significant in the three data sets analyzed. This correlation between nonsynonymous rate and base composition at the second codon position is also detectable at the intragenic level, which implies that the factors that tend to increase the intergenic variance in nonsynonymous rates also affect the intragenic variance. On the other hand, we have shown that the synonymous rate is strongly correlated with the GC3 level. This correlation is observed both across genes and at the intragenic level. Similarly, the nonsynonymous rate is also affected at the intragenic level by GC3 level, like the silent rate. In fact, synonymous and nonsynonymous rates exhibit a parallel behavior in relation to GC3 level, indicating that the intragenic patterns of both silent and amino acid divergence rates are influenced in a similar way by the intragenic variation of GC3. This result, taken together with the fact that the number of genes displaying intragenic correlation coefficients between synonymous and nonsynonymous rates is not very high, but higher than random expectation (in the three data sets analyzed), strongly suggests that the processes of silent and amino acid replacement divergence are, at least in part, driven by common evolutionary forces in genes from Gramineae. Received: 2 July 1998 / Accepted: 18 April 1999  相似文献   

15.
Gene paralogs are copies of an ancestral gene that appear after gene or full genome duplication. When two sister gene copies are maintained in the genome, redundancy may release certain evolutionary pressures, allowing one of them to access novel functions. Here, we focused our study on gene paralogs on the evolutionary history of the three polypyrimidine tract binding protein genes (PTBP) and their concurrent evolution of differential codon usage preferences (CUPrefs) in vertebrate species. PTBP1-3 show high identity at the amino acid level (up to 80%) but display strongly different nucleotide composition, divergent CUPrefs and, in humans and in many other vertebrates, distinct tissue-specific expression levels. Our phylogenetic inference results show that the duplication events leading to the three extant PTBP1-3 lineages predate the basal diversification within vertebrates, and genomic context analysis illustrates that local synteny has been well preserved over time for the three paralogs. We identify a distinct evolutionary pattern towards GC3-enriching substitutions in PTBP1, concurrent with enrichment in frequently used codons and with a tissue-wide expression. In contrast, PTBP2s are enriched in AT-ending, rare codons, and display tissue-restricted expression. As a result of this substitution trend, CUPrefs sharply differ between mammalian PTBP1s and the rest of PTBPs. Genomic context analysis suggests that GC3-rich nucleotide composition in PTBP1s is driven by local substitution processes, while the evidence in this direction is thinner for PTBP2-3. An actual lack of co-variation between the observed GC composition of PTBP2-3 and that of the surrounding non-coding genomic environment would raise an interrogation on the origin of CUPrefs, warranting further research on a putative tissue-specific translational selection. Finally, we communicate an intriguing trend for the use of the UUG-Leu codon, which matches the trends of AT-ending codons. Our results are compatible with a scenario in which a combination of directional mutation–selection processes would have differentially shaped CUPrefs of PTBPs in vertebrates: the observed GC-enrichment of PTBP1 in placental mammals may be linked to genomic location and to the strong and broad tissue-expression, while AT-enrichment of PTBP2 and PTBP3 would be associated with rare CUPrefs and thus, possibly to specialized spatio-temporal expression. Our interpretation is coherent with a gene subfunctionalisation process by differential expression regulation associated with the evolution of specific CUPrefs.  相似文献   

16.
The translation of poly(A)-rich and poly(A)-poor populations of Encephalomyocarditis viral RNA was studied in Ehrlich ascites cell-free extracts. The poly(A)-rich viral RNA was translated 2–3 times more efficiently than the poly(A)-poor RNA. Both viral RNA populations were found to be similar with respect to susceptibility to nuclease attack as well as their ability to initiate and carry out protein synthesis early in the translation reaction. Later in the reaction, however, translation of the poly(A)-poor RNA was markedly reduced whereas translation of the poly(A)-rich RNA continued. These results are consistent with the hypothesis that poly(A) plays a role in the re-utilization and longevity of mRNA.  相似文献   

17.
为确定瑶药紫九牛叶绿体基因组密码子的使用模式及其成因,该研究以紫九牛叶绿体基因组50条蛋白质编码序列为研究对象,利用Codon W 1.4.2和在线软件CUSP和Chips分析其密码子偏好性。结果表明:(1)RSCU>1的密码子有29个,其中有28个以A/U结尾,说明叶绿体基因组的同义密码子中偏好以A/U结尾。(2)紫九牛叶绿体基因组密码子的GC含量GC1(47.38%)>GC2(39.81%)>GC3(29.60%),ENC值大于45的有40个,说明紫九牛叶绿体基因组存在较弱的偏性。(3)中性绘图分析和ENC-plot分析说明了紫九牛叶绿体基因组密码子的偏好性既受到选择的作用,又受到突变因素的影响。(4)通过构建的高低基因表达库最终确定了15个最优密码子,分别为UUG、AUU、GUU、GUA、UCU、 CCU、ACU、ACA、GCU、CAA、AAC、GAA、UGU、CGU和GGU。该研究为紫九牛叶绿体基因组的确定以及遗传多样性分析提供了依据。  相似文献   

18.
19.
Histones are vital structural proteins of chromatin that influence its dynamics and function. The tissue-specific expression of histone variants has been shown to regulate the expression of specific genes and genomic stability in animal systems. Here we report on the characterization of five histone H3 variants expressed in Lilium generative cell. The gcH3 and leH3 variants show unique sequence diversity by lacking a conserved lysine residue at position 9 (H3K9). The gH3 shares conserved structural features with centromeric H3 of Arabidopsis. The gH3 variant gene is strongly expressed in generative cells and gH3 histone is incorporated in to generative cell chromatin. The lysine residue of H3 at position 4 (H3K4) is highly methylated in the nuclei of generative cells of mature pollen, while methylation of H3K4 is low in vegetative cell nuclei. Taken together, these results suggest that male gametic cells of Lilium have unique chromatin state and histone H3 variants and their methylation might be involved in gene regulation of male gametic cells.Accession numbers for the sequence data The sequences reported in this paper have been deposited in the DDBJ database gcH3 GC1174 (accession no. AB195644), gH3 GC1008 (accession no. AB195646), leH3 GC1126 (accession no. AB195648), soH3-1 GC0075 (accession no. AB195650), soH3-2 GC1661 (accession no. AB195652), genomic sequence of gcH3 (accession no. AB195645), genomic sequence of gH3 (accession no. AB195647), genomic sequence of leH3 (accession no. AB195649), genomic sequence of soH3-2 (accession no. AB195651), genomic sequence of soH3-2 (accession no. AB195653).  相似文献   

20.
该研究以2株野生沙枣(Elaeagnus angustifolia Linn.)嫩枝经温室水培后的嫩叶为材料,采用CTAB法分别提取总DNA,并利用第二代测序技术进行总DNA从头测序,组装后得到2株沙枣叶绿体基因组全序列,并详细分析了其蛋白质编码基因密码子使用的偏好性及其原因,为沙枣叶绿体基因工程和分子系统进化等研究奠定基础。结果显示:(1)组装得到沙枣叶绿体基因组序列全长150 546 bp,由长度为81 113 bp的长单拷贝(LSC)区域和25 494 bp的短单拷贝(SSC)区域,以及1对分隔开它们的长18 445 bp的反向重复序列(IRS)组成;注释共得到132个基因,包括86个蛋白编码基因、38个tRNA基因和8个rRNA基因。(2)沙枣叶绿体基因组蛋白编码基因密码子的第三位碱基GC含量(GC_3)为28.47%,明显低于整个叶绿体基因组GC含量(37%),也低于第一位(GC_1)和第二位(GC_2)碱基的GC含量,说明密码子对AT碱基结尾有偏好性;其中, UCU、CCU、UGU、GCU、CUU、GAU、UCA和UAA为最优密码子。(3)同义密码子相对使用频率(RSCU)分析发现,影响密码子使用模式的因素并不单一,密码子的偏好性受到突变、选择及其他因素的共同影响,并且自然选择表达引起的序列差异比突变对密码子偏好性的影响要显著;中性绘图分析、有效密码子数(ENC-plot)分析和奇偶偏好性(PR2-plot)分析表明,沙枣叶绿体基因组使用密码子的偏性受选择的影响更大。(4)通过最大似然法、最大简约法和贝叶斯方法对胡颓子科6个物种和1个枣的叶绿体基因序列构建系统发育树,与它们使用密码子偏性聚类的结果一致,表明叶绿体基因组使用密码子偏性与物种的亲缘关系相关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号