首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We conducted a genome-wide analysis of variations in guanine plus cytosine (G+C) content at the third codon position at silent substitution sites of orthologous human and mouse protein-coding nucleotide sequences. Alignments of 3776 human protein-coding DNA sequences with mouse orthologs having >50 synonymous codons were analyzed, and nucleotide substitutions were counted by comparing sequences in the alignments extracted from gap-free regions. The G+C content at silent sites in these pairs of genes showed a strong negative correlation (r = -0.93). Some gene pairs showed significant differences in G+C content at the third codon position at silent substitution sites. For example, human thymine-DNA glycosylase was A+T-rich at the silent substitution sites, while the orthologous mouse sequence was G+C-rich at the corresponding sites. In contrast, human matrix metalloproteinase 23B was G+C-rich at silent substitution sites, while the mouse ortholog was A+T-rich. We discuss possible implications of this significant negative correlation of G+C content at silent sites.  相似文献   

2.
Analysis of DNA sequences of 132 introns and 140 exons from 42 pairs of orthologous genes of mouse and rat was used to compare patterns of evolutionary change between introns and exons. The mean of the absolute difference in length (measured in base pairs) between the two species was nearly five times as high in the case of introns as in the case of exons. The average rate of nucleotide substitution in introns was very similar to the rate of synonymous substitution in exons, and both were about three times the rate of substitution at nonsynonymous sites in exons. G+C content of introns and exons of the same gene were correlated; but mean G+C content at the third positions of exons was significantly higher than that of introns or positions 1–2 of exons from the same gene. G+C content was conserved over evolutionary time, as indicated by strong correlations between mouse and rat; but the change in G+C content was greatest at position 3 of exons, intermediate in introns, and lowest at positions 1–2 in introns. Received: 23 December 1996 / Accepted: 1 April 1997  相似文献   

3.
Genome-wide analysis of sequence divergence patterns in 12,024 human-mouse orthologous pairs reveals, for the first time, that the trends in nucleotide and amino acid substitutions in orthologs of high and low GC composition are highly asymmetric and polarized to opposite directions. The entire dataset has been divided into three groups on the basis of the GC content at third codon sites of human genes: high, medium, and low. High-GC orthologs exhibit significant bias in favor of the replacements, Thr --> Ala, Ser --> Ala, Val --> Ala, Lys --> Arg, Asn --> Ser, Ile --> Val etc., from mouse to human, whereas in low-GC orthologs, the reverse trends prevail. In general, in the high-GC group, residues encoded by A/U-rich codons of mouse proteins tend to be replaced by the residues encoded by relatively G/C-rich codons in their human orthologs, whereas the opposite trend is observed among the low-GC orthologous pairs. The medium-GC group shares some trends with high-GC group and some with low-GC group. The only significant trend common in all groups of orthologs, irrespective of their GC bias, is (Asp)(Mouse) --> (Glu)(Human) replacement. At the nucleotide level, high-GC orthologs have undergone a large excess of (A/T)(Mouse) --> (G/C)(Human) substitutions over (G/C)(Mouse) --> (A/T)(Human) at each codon position, whereas for low-GC orthologs, the reverse is true.  相似文献   

4.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

5.
Genomes of the herpes simplex viruses are extremely enriched with GC. Elevated G+C level in genomes of the simplex viruses is a result of their long-term evolution under the influence of the mutational pressure. We counted the rates of nucleotide substitutions from gene coding major capsid protein (MCP) (G+C = 0.68, 3GC = 0.89) of human simplex virus 1 (HSV-1) to the MCP gene (G+C = 0.70, 3GC = 0.91) of HSV-2 (the first pair of genes) and from the same MCP gene of HSV-1 to the homologous gene (G+C = 0.73, 3GC = 0.99) from cercopithecine herpes virus 16 (the second pair of genes). The rates of transitions from A-T to G-C base pairs increases 2.17-, 3.09-, and 1.27-fold in the first, second, and third codon positions, respectively, if compared those rates between the second and first pair of genes (the growth of GC-richness is only 3%). This effect is due to an approximately 90% GC-richness of the third codon positions in all those genes. Transitions caused by the strong mutational pressure (from A-T to G-C base pairs) have a low probability to occur in the third positions, but high probability to occur in the first and second positions. For MCP gene of human herpes 3, the probability of the occurrence of transition caused by mutational pressure in the third codon position is 2.36 times higher than in MCP gene of HSV1, and 3 times higher than in MCP gene of HSV2. These data could provide an explanation of rarely occurring relapses of herpes Zoster infection and frequently occurring relapses of herpes simplex infection.  相似文献   

6.
Genomic GC (overall G+C content of the coding sequences) variations were reinvestigated between the orthologous genes of Mycobacterium tuberculosis and Mycobacterium leprae species. It was observed that overall genomic GC variation between the species mainly originates from the combined effects GC(1) and GC(2) variations. But codons having identical amino acids with different codons (IA) (between the orthologous codon pairs) are responsible for the genomic GC(3) variation between the organisms, whereas orthologous codons having different amino acids (DA) between the two organisms are responsible for the variation of GC(1) levels. Further analyses indicate that duets and quartets are going in the same direction with same magnitude in changing the GC(3) levels for IA category, whereas GC(1) levels of duets of DA category decreases significantly from the overall GC(1) levels but GC(1) levels of quartets increases significantly from the overall GC(1) levels. GC(3) levels of informational genes for the IA category decrease more rapidly than the other functional categories of genes. The biological implications of these results have been discussed in this paper.  相似文献   

7.
Correlation was positive between the G + C content at the codon third position in genes of vertebrates and the G + C content of the genome portion surrounding each gene. Exons of genes with a high G + C% at the codon 3rd position are surrounded by G + C-rich introns and G + C-rich flanking sequences, and those with a low G + C% at the position by A + T-rich introns and flanking sequences. Analysis of G + C content distribution along DNA sequences using a DNA Sequence Data Bank supported the view that the vertebrate genome is a mosaic of regions with clear differences in their G + C content. The biological significance of the variation in G + C content throughout the vertebrate genome is discussed in connection with chromosomal banding.  相似文献   

8.
A compositional transition was previously detected by comparing orthologous coding sequences from cold- and warm-blooded vertebrates (see Bernardi, G., Hughes, S., Mouchiroud, D., 1997. The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44, S44-S51 for a review). The transition is characterized by higher GC levels (GC is the molar ratio of guanine+cytosine in DNA) and, especially, by higher GC3 levels (GC3 is the GC level of third codon positions) in coding sequences from warm-blooded vertebrates. This transition essentially affects GC-rich genes, although the nucleotide substitution rate is of the same order of magnitude in both GC-poor and GC-rich genes. In order to understand the evolutionary basis of the changes, we have compared the hydrophobicity of orthologous proteins from Xenopus and human. Although the differences are small in proteins encoded by coding sequences ranging from 0 to 65% in GC3, they are large in the proteins encoded by sequences characterized by GC3 values higher than 65%. The latter proteins are more hydrophobic in human than in Xenopus.  相似文献   

9.
Thomas  James W. 《Mammalian genome》2003,14(10):673-678
Comparative mapping and sequencing of the mouse and human genomes have defined large, conserved chromosomal segments in which gene content and order are highly conserved. These regions span megabase-sized intervals and together comprise the vast majority of both genomes. However, the evolutionary relationships among the small remaining portions of these genomes are not as well characterized. Here we describe the sequencing and annotation of a 341-kb region of mouse Chr 2 containing nine genes, including biliverdin reductase A (Blvra), and its comparison with the orthologous regions of the human and rat genomes. These analyses reveal that the known conserved synteny between mouse Chromosome (Chr) 2 and human Chr 7 reflects an interval containing one gene (Blvra/BLVRA) that is, at most, just 34 kb in the mouse genome. In the mouse, this segment is flanked proximally by genes orthologous to human chromosome 15q21 and distally by genes orthologous to human Chr 2q11. The observed differences between the human and mouse genomes likely resulted from one or more rearrangements in the rodent lineage. In addition to the resulting changes in gene order and location, these rearrangements also appear to have included genomic deletions that led to the loss of at least one gene in the rodent lineage. Finally, we also have identified a recent mouse-specific segmental duplication. These finding illustrate that small genomic regions outside the large mouse–human conserved segments can contain a single gene as well as sequences that are apparently unique to one genome. The nucleotide sequence data reported in this paper have been submitted to GenBank and assigned the accession numbers AC074224 and AC074041.  相似文献   

10.
BACKGROUND: Nucleotide substitution rates and G + C content vary considerably among mammalian genes. It has been proposed that the mammalian genome comprises a mosaic of regions - termed isochores - with differing G + C content. The regional variation in gene G + C content might therefore be a reflection of the isochore structure of chromosomes, but the factors influencing the variation of nucleotide substitution rate are still open to question. RESULTS: To examine whether nucleotide substitution rates and gene G + C content are influenced by the chromosomal location of genes, we compared human and murid (mouse or rat) orthologues known to belong to one of the chromosomal (autosomal) segments conserved between these species. Multiple members of gene families were excluded from the dataset. Sets of neighbouring genes were defined as those lying within 1 centiMorgan (cM) of each other on the mouse genetic map. For both synonymous substitution rates and G + C content at silent sites, neighbouring genes were found to be significantly more similar to each other than sets of genes randomly drawn from the dataset. Moreover, we demonstrated that the regional similarities in G + C content (isochores) and synonymous substitution rate were independent of each other. CONCLUSIONS: Our results provide the first substantial statistical evidence for the existence of a regional variation in the synonymous substitution rate within the mammalian genome, indicating that different chromosomal regions evolve at different rates. This regional phenomenon which shapes gene evolution could reflect the existence of 'evolutionary rate units' along the chromosome.  相似文献   

11.
12.
D'Onofrio G  Ghosh TC 《Gene》2005,345(1):27-33
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.  相似文献   

13.
14.
The gene for L-lactate dehydrogenase (LDH) (EC 1.1.1.27) of Thermus caldophilus GK24 was cloned in Escherichia coli using synthetic oligonucleotides as hybridization probes. The nucleotide sequence of the cloned DNA was determined. The primary structure of the LDH was deduced from the nucleotide sequence. The deduced amino acid sequence agreed with the NH2-terminal and COOH-terminal sequences previously reported and the determined amino acid sequences of the peptides obtained from trypsin-digested T. caldophilus LDH. The LDH comprised 310 amino acid residues and its molecular mass was determined to be 32,808. On alignment of the whole amino acid sequences, the T. caldophilus LDH showed about 40% identity with the Bacillus stearothermophilus, Lactobacillus casei and dogfish muscle LDHs. The T. caldophilus LDH gene was expressed with the E. coli lac promoter in E. coli, which resulted in the production of the thermophilic LDH. The gene for the T. caldophilus LDH showed more than 40% identity with those for the human and mouse muscle LDHs on alignment of the whole nucleotide sequences. The G + C content of the coding region for the T. caldophilus LDH was 74.1%, which was higher than that of the chromosomal DNA (67.2%). The G + C contents in the first, second and third positions of the codons used were 77.7%, 48.1% and 95.5% respectively. The high G + C content in the third base caused extremely non-random codon usage in the LDH gene. About half (48.7%) the codons in the LDH gene started with G, and hence there were relatively high contents of Val, Ala, Glu and Gly in the LDH. The contents of Pro, Arg, Ala and Gly, which have high G + C contents in their codons, were also high. Rare codons with U or A as the third base were sometimes used to avoid the TCGA sequence, the recognition site for the restriction endonuclease, TaqI. Two TCGA sequences were found only in the sequence of CTCGAG (XhoI site) in the sequenced region of the T. caldophilus DNA. There were three segments with similar sequences in the two 5' non-coding regions, probably the promoter and ribosome-binding regions, of the genes for the T. caldophilus LDH and the Thermus thermophilus 3-isopropylmalate dehydrogenase.  相似文献   

15.
The nucleotide sequence of the beta globin gene cluster of the prosimian Galago crassicaudatus has been determined. A total sequence spanning 41,101 bp contains and links together previously published sequences of the five galago beta-like globin genes (5'-epsilon-gamma-psi eta-delta-beta-3'). A computer-aided search for middle interspersed repetitive sequences identified 10 LINE (L1) elements, including a 5' truncated repeat that is orthologous to the full-length L1 element found in the human epsilon-gamma intergenic region. SINE elements that were identified included one Alu type I repeat, four Alu type II repeats, and two methionine tRNA-derived Monomer (type III) elements. Alu type II and Monomer sequences are unique to the galago genome. Structural analyses of the cluster sequence reveals that it is relatively A+T rich (about 62%) and regions with high G+C content are associated primarily with globin coding regions. Comparative analyses with the beta globin cluster sequences of human, rabbit, and mouse reveal extensive sequence homologies in their genic regions, but only human, galago, and rabbit sequences share extensive intergenic sequence homologies. Divergence analyses of aligned intergenic and flanking sequences from orthologous human, galago, and rabbit sequences show a gradation in the rate of nucleotide sequence evolution along the cluster where sequences 5' of the epsilon globin gene region show the least sequence divergence and sequences just 5' of the beta globin gene region show the greatest sequence divergence.  相似文献   

16.
It is important and meaningful to understand the codon usage pattern and the factors that shape codon usage of maize. In this study, trends in synonymous codon usage in maize have been firstly examined through the multivariate statistical analysis on 7402 cDNA sequences. The results showed that the genes positions on the primary axis were strongly negatively correlated with GC3s, GC content of individual gene and gene expression level assessed by the codon adaptation index (CAI) values, which indicated that nucleotide composition and gene expression level were the main factors in shaping the codon usage of maize, and the variation in codon usage among genes may be due to mutational bias at the DNA level and natural selection acting at the level of mRNA translation. At the same time, CDS length and the hydrophobicity of each protein were, respectively, significantly correlated with the genes locations on the primary axis, GC3s and CAI values. We infer that genes length and the hydrophobicity of the encoded protein may play minor role in shaping codon usage bias. Additional 28 codons ending with a G or C base have been defined as “optimal codons”, which may provide useful information for maize gene-transformation and gene prediction.  相似文献   

17.
DNA序列进化过程中核苷酸替代的非独立性研究   总被引:4,自引:2,他引:2  
杨子恒 《遗传学报》1990,17(5):354-359
本文评述了DNA序列间核苷酸替代数的估计方法,并通过对七个物种中组蛋白基因的比较对DNA进化的模型进行了考察。发现H2A基因第三位点上的碱基组成在物种间变异很大,并且跟H2A基因第一位点、H4基因第一、三位点及H2A上游,下游序列中的碱基组成有强正相关,提示DNA序列进化过程中存在着物种特异的区域性约束力。可能的原因是高等真核生物中GC含量升高,或者是染色体重组使这些同源序列位于不同的等质区段,从而受到不同的选择突变压。密码内各位点上核苷酸替代的相关性分析表明不同位点的替代是非独立的,其原因可能是一次替代事件引起多个位点的变化。文中讨论了这些结果对进化树推断的意义。  相似文献   

18.
Comparative genomics is a superior way to identify phylogenetically conserved features like genes or regions involved in gene regulation. The comparison of extended orthologous chromosomal regions should also reveal other characteristic traits essential for chromosome or gene function. In the present study we have sequenced and compared a region of conserved synteny from human chromosome 11p15.3 and mouse chromosome 7. In human, this region is known to contain several genes involved in the development of various disorders like Beckwith-Wiedemann overgrowth syndrome and other tumor diseases. Furthermore, in the neighboring chromosome region 11p15.5 extensive imprinting of genes has been reported which might extend to region 11p15.3. The analysis of approximately 730 kb in human and 620 kb in mouse led to the identification of eleven genes. All putative genes found in the mouse DNA were also present in the same order and orientation in the human chromosome. However, in the human DNA one putative gene of unknown function could be identified which is not present in the orthologous position of the mouse chromosome. The sequence similarity between human and mouse is higher in transcribed and exon regions than in non-transcribed segments. Dot plot analysis, however, reveals a surprisingly well-conserved sequence similarity over the entire analyzed region. In particular, the positions of CpG islands, short regions of very high GC content in the 5' region of putative genes, are similar in human and mouse. With respect to base composition, two distinct segments of significantly different GC content exist as well in human as in the mouse. With a GC content of 45% the one segment would correspond to "isochore H1" and the other segment (39% GC in human, 40% GC in mouse) to "isochore L1/L2". The gene density (one gene per 66 kb) is slightly higher than the average calculated for the complete human genome (one gene per 90 kb). The comparison of the number and distribution of repetitive elements shows that the proportion of human DNA made up by interspersed repeats (43.8%) is significantly higher than in the corresponding mouse DNA (30.1%). This partly explains why the human DNA is longer between the landmark genes used to define the orthologous positions in human and mouse.  相似文献   

19.
20.
The complete nucleotide sequence of the Pseudomonas chromosomal gene coding for the enzyme carboxypeptidase G2 (CPG2) has been determined. The nucleotide sequence obtained has been confirmed by comparing the predicted amino acid sequence with that of randomly derived peptide fragments and by N-terminal sequencing of the purified protein. The gene has been shown to code for a 22 amino acid signal peptide at its N-terminus which closely resembles the signal peptides of other secreted proteins. An alternative 36 amino acid signal peptide which may function in Pseudomonas has also been identified. The codon utilisation of the gene is influenced by the high G + C (67.2%) content of the DNA and exhibits a 92.8% preference for codons ending in G or C. This unusual codon preference may contribute to the generally observed weak expression of Pseudomonas genes in Escherichia coli. A region of DNA upstream of the structural gene has also been sequenced and a ribosome binding site and two putative promoter sequences identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号