首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The relationship between G + C-content and codon usage in genes of human, mus, rat, bovine and chicken nuclear genomes was investigated. Correlation and lineal regression analyses were carried out on plots that related the frequency of each codon within each synonymous codon group to the G + C-content of the coding sequence as a whole. Under GC pressure, in most of the quartet codon groups there is a preferential choice of the C-ending codon, except in leucine and valine codon groups where the choice of the G-ending codon is preferred. Among ducts, the choice of codons specifying phenylalanine and glutamate shows the strongest dependence on G + C-content. The relationship found between G + C-content and codon usage in these genomes correlate with taxonomic distance.  相似文献   

2.
The human genome is a mosaic of isochores, which are long DNA segments (300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.  相似文献   

3.
Vertebrate genomes are comprised of isochores that are relatively long (>100 kb) regions with a relatively homogenous (either GC-rich or AT-rich) base composition and with rather sharp boundaries with neighboring isochores. Mammals and living archosaurs (birds and crocodilians) have heterogeneous genomes that include very GC-rich isochores. In sharp contrast, the genomes of amphibians and fishes are more homogeneous and they have a lower overall GC content. Because DNA with higher GC content is more thermostable, the elevated GC content of mammalian and archosaurian DNA has been hypothesized to be an adaptation to higher body temperatures. This hypothesis can be tested by examining structure of isochores across the reptilian clade, which includes the archosaurs, testudines (turtles), and lepidosaurs (lizards and snakes), because reptiles exhibit diverse body sizes, metabolic rates, and patterns of thermoregulation. This study focuses on a comparative analysis of a new set of expressed genes of the red-eared slider turtle and orthologs of the turtle genes in mammalian (human, mouse, dog, and opossum), archosaurian (chicken and alligator), and amphibian (western clawed frog) genomes. EST (expressed sequence tag) data from a turtle cDNA library enriched for genes that have specialized functions (developmental genes) revealed using the GC content of the third-codon-position to examine isochore structure requires careful consideration of the types of genes examined. The more highly expressed genes (e.g., housekeeping genes) are more likely to be GC-rich than are genes with specialized functions. However, the set of highly expressed turtle genes demonstrated that the turtle genome has a GC content that is intermediate between the GC-poor amphibians and the GC-rich mammals and archosaurs. There was a strong correlation between the GC content of all turtle genes and the GC content of other vertebrate genes, with the slope of the line describing this relationship also indicating that the isochore structure of turtles is intermediate between that of amphibians and other amniotes. These data are consistent with some thermal hypotheses of isochore evolution, but we believe that the credible set of models for isochore evolution still includes a variety of models. These data expand the amount of genomic data available from reptiles upon which future studies of reptilian genomics can build.  相似文献   

4.
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments (>300 kb on average) relatively homogeneous in G+C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G+C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G+C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available.  相似文献   

5.
Cao  Peng  Dai  Qinlong  Deng  Cao  Zhao  Xiang  Qin  Shishan  Yang  Jian  Ju  Ran  Wang  Zhiwen  Lu  Guoqing  Gu  Xiaodong  Yang  Zhisong  Zhu  Lifeng 《中国科学:生命科学英文版》2021,64(10):1765-1780
Animal body coverings provide protection and allow for adaptation to environmental pressures such as heat, ultraviolet radiation,water loss, and mechanical forces. Here, using a comparative genomics analysis of 39 mammal species spanning three skin covering types(hairless, scaly and spiny), we found some genes(e.g., UVRAG, POLH, and XPC) involved in skin inflammation,skin innate immunity, and ultraviolet radiation damage repair were under selection in hairless ocean mammals(e.g., whales and manatees). These signatures might be associated with a high risk of skin diseases from pathogens and ultraviolet radiation.Moreover, the genomes from three spiny mammal species shared convergent genomic regions(EPHB2, EPHA4, and NIN) and unique positively selected genes(FZD6, INVS, and CDC42) involved in skin cell polarity, which might be related to the development of spines. In scaly mammals, the shared convergent genomic regions(e.g., FREM2) were associated with the integrity of the skin epithelium and epidermal adhesion. This study identifies potential convergent genomic features among distantly related mammals with the same skin covering type.  相似文献   

6.
The mammalian genome is not a random sequence but shows a specific, evolutionarily conserved structure that becomes manifest in its isochore pattern. Isochores, i.e. stretches of DNA with a distinct sequence composition and thus a specific GC content, cause the chromosomal banding pattern. This fundamental level of genome organization is related to several functional features like the replication timing of a DNA sequence. GC richness of genomic regions generally corresponds to an early replication time during S phase. Recently, we demonstrated this interdependency on a molecular level for an abrupt transition from a GC-poor isochore to a GC-rich one in the NF1 gene region; this isochore boundary also separates late from early replicating chromatin. Now, we analyzed another genomic region containing four isochores separated by three sharp isochore transitions. Again, the GC-rich isochores were found to be replicating early, the GC-poor isochores late in S phase; one of the replication time zones was discovered to consist of one single replicon. At the boundaries between isochores, that all show no special sequence elements, the replication machinery stopped for several hours. Thus, our results emphasize the importance of isochores as functional genomic units, and of isochore transitions as genomic landmarks with a key function for chromosome organization and basic biological properties.  相似文献   

7.
Primary structure of the herpesvirus saimiri genome.   总被引:55,自引:41,他引:14       下载免费PDF全文
This report describes the complete nucleotide sequence of the genome of herpesvirus saimiri, the prototype of gammaherpesvirus subgroup 2 (rhadinoviruses). The unique low-G + C-content DNA region has 112,930 bp with an average base composition of 34.5% G + C and is flanked by about 35 noncoding high-G + C-content DNA repeats of 1,444 bp (70.8% G + C) in tandem orientation. We identified 76 major open reading frames and a set of seven U-RNA genes for a total of 83 potential genes. The genes are closely arranged, with only a few regions of sizable noncoding sequences. For 60 of the predicted proteins, homologous sequences are found in other herpesviruses. Genes conserved between herpesvirus saimiri and Epstein-Barr virus (gammaherpesvirus subgroup 1) show that their genomes are generally collinear, although conserved gene blocks are separated by unique genes that appear to determine the particular phenotype of these viruses. Several deduced protein sequences of herpesvirus saimiri without counterparts in most of the other sequenced herpesviruses exhibited significant homology with cellular proteins of known function. These include thymidylate synthase, dihydrofolate reductase, complement control proteins, the cell surface antigen CD59, cyclins, and G protein-coupled receptors. Searching for functional protein motifs revealed that the virus may encode a cytosine-specific methylase and a tyrosine-specific protein kinase. Several herpesvirus saimiri genes are potential candidates to cooperate with the gene for saimiri transformation-associated protein of subgroup A (STP-A) in T-lymphocyte growth stimulation.  相似文献   

8.
Isochore structures in the mouse genome   总被引:2,自引:0,他引:2  
Zhang CT  Zhang R 《Genomics》2004,83(3):384-394
The distribution of the G+C content in the mouse genome has been studied using a windowless technique. We have found that: (i). Abrupt variations of the G+C content from a GC-rich region to a GC-poor region, and vice versa, occur frequently at some sites along the sequence of the mouse genome. (ii). Long domains with relatively homogeneous G+C content (isochores) exist, which usually have sharp boundaries. Consequently, 28 isochores longer than 1 Mb have been identified in the mouse genome. A homogeneity index was used to quantify the variations of the G+C content within isochores. The precise boundaries, sizes, and G+C contents of these isochores have been determined. The windowless technique for the G+C content computation was also used to analyze the DNA sequence containing the mouse MHC region, which has a GC-poor isochore. This isochore is located at the central part of the sequence with boundaries at 468459 and 812716 bp, where the sequence is extended from the centromeric end to the telomeric end. In addition, the analysis of a segment of the rat genome shows that the rat genome also has clear isochore structures.  相似文献   

9.
The human genome is described in the literature as being composed of the isochores, i.e., long (hundreds of kilobases) segments with a homogeneous (G + C) content. We calculated the (G + C) content variations along the DNA molecules of the human chromosomes 21 and 22 and found the variations to be higher everywhere compared to the randomized sequences. Hence the (G + C) content is certainly not homogeneous on the isochore scale in the two human chromosomes. In addition, we found no significant difference between the two human molecules and the genome of E. coli regarding the (G + C) content variations. Hence no isochores are either present in the DNA molecules of the human chromosomes 21 and 22, or the isochores are also present in the genome of Escherichia coli. In any case, the present communication demonstrates that the isochores should be defined in unambiguous molecular terms if they are to be used for an up-to-date genome structure characterization.  相似文献   

10.
Since the G + C content of a gene is correlated to that of the isochore in which it resides, and early replicating isochores are thought to be relatively G + C rich, early replicating genes should also be rich in G + C. This hypothesis is tested on a sample of 44 mammalian genes for which replication time data and sequence information are available. Early replicating genes do not appear to be more G + C rich than late replicating genes, instead there is considerable variation in the G + C content of genes replicated during both halves of S phase. These results show that both G + C rich and poor fractions of the genome are replicated early and late in the cell cycle, and suggest that isochores are not maintained by the replication of DNA sequences in compositionally biased free nucleotide pools.  相似文献   

11.
PIKE, L. M., HU, A., RENZAGLIA, K. S. & MUSICH, P. R., 1992. Liverwort genomes display extensive structural variations. Analyses of the total genomic DNA of eight species of liverworts and two species of green algae by thermal denaturation and CsCl buoyant density gradient centrifugation reveal a high degree of structural complexity and interspecific heterogeneity. The hepatic taxa exhibit two or more DNA components of varying base composition. Average G4-C contents of total cellular DNA calculated from melting profiles are similarly variable, ranging from 38% to 53% G + C. The green alga Chara , a member of the ancestral line to land plants, shows similarities with liverworts in possessing multiple DNA components of comparable complexity, whereas Hydrodiciyon DNA displays a single component. Detailed hybridization analyses of individual density gradient fractions using α-tubulin, rRNA and ribulose 1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene probes were performed to locate the low-copy number and moderately repetitive nuclear genes, and the chloroplast chromosome, respectively. The location of each gene within the density gradient is highly variable among the organisms examined; a-tubulin occurs in fractions ranging from 44–64% G + C, rDNA in 50–64% G + C fractions, and the RbcL gene is located in fractions from 30–59% G + C. For a given species, the two nuclear genes normally overlap in their distributions within the gradient. In most instances, neither gene occurs in the major DNA components, indicating that these components may contain repetitive DNAs. The observed variation in the density of the rbcL gene implies substantial reorganization of the chloroplast genome. The overall differences in the genomic components within and between taxa provide insight into the dynamics of DNA structure that have occurred during the extended evolutionary history of these organisms.  相似文献   

12.
T Ikemura  K Wada  S Aota 《Genomics》1990,8(2):207-216
To determine the overall variation in the G+C% distribution over long ranges of the human genome, DNA sequences of human genes, which were closely linked genetically or physically, were surveyed from the GenBank Data Bank. A total of 72 sequences longer than 2 kb, which were mutually linked within 500 kb, were identified. The sequences belonged to 17 linkage groups and were ordered in each group according to their genetic positions. Analyses of the G+C% distribution along the ordered sequences showed that sequences within each group almost always had similar G+C% levels, but those belonging to different groups often had different levels. Similar analyses of more distantly linked sequences (e.g., greater than 10 Mb) showed mosaic structures of G+C% distribution. These findings are consistent with predictions made from the "isochore" structures found by CsCl equilibrium centrifugation, in that the structures having homogeneous base compositions stretched over at least several hundred kilobases. A possible boundary of the giant G+C% mosaic structures was identified between X-linked G6PD and F8C.  相似文献   

13.
Numerous microorganisms, including bacteria, yeasts, and molds, constitute the complex ecosystem present in milk and fermented dairy products. Our aim was to describe the bacterial ecosystem of various cheeses that differ by production technology and therefore by their bacterial content. For this purpose, we developed a rapid, semisystematic approach based on genetic profiling by temporal temperature gradient electrophoresis (TTGE) for bacteria with low-G+C-content genomes and denaturing gradient gel electrophoresis (DGGE) for those with medium- and high-G+C-content genomes. Bacteria in the unknown ecosystems were assigned an identity by comparison with a comprehensive bacterial reference database of approximately 150 species that included useful dairy microorganisms (lactic acid bacteria), spoilage bacteria (e.g., Pseudomonas and Enterobacteriaceae), and pathogenic bacteria (e.g., Listeria monocytogenes and Staphylococcus aureus). Our analyses provide a high resolution of bacteria comprising the ecosystems of different commercial cheeses and identify species that could not be discerned by conventional methods; at least two species, belonging to the Halomonas and Pseudoalteromonas genera, are identified for the first time in a dairy ecosystem. Our analyses also reveal a surprising difference in ecosystems of the cheese surface versus those of the interior; the aerobic surface bacteria are generally G+C rich and represent diverse species, while the cheese interior comprises fewer species that are generally low in G+C content. TTGE and DGGE have proven here to be powerful methods to rapidly identify a broad range of bacterial species within dairy products.  相似文献   

14.
Runs of homozygosity (ROHs) arise due the transmission from parents to offspring of segments that are either identical by decent (IBD) or identical by state (IBS). The former is due to consanguineous matings whereas the latter is due to demographic processes. ROHs reduce individual nucleotide diversity (θ) as a function of homozygosity, and thus ROH distributions and θ are expected to vary among species because inbreeding levels, recombination rates, and demographic histories vary widely. To help interpret genetic diversity within and among species, we utilized genome sequence data from 78 mammalian species to compare θ and ROH burden (i.e., number and length of ROHs in the genome) among groups of mammals to assess genomic signatures of inbreeding. We compared θ and ROHs: (i) among threatened and non-threatened mammals to determine the significance of contemporary conservation status; (ii) among carnivorous and non-carnivorous mammals to determine the relevance of trophic effects; (iii) relative to body size because mutation rates generally vary with body mass; and (iv) across mammals from different latitudes to test for gradients in genomic diversity (e.g., due to effects of historic climatic regimes). Our results illustrate the considerable variance in genomic diversity across mammals, and that trophic level, body mass, and latitude have significant effects on θ and ROH burden. However, conservation status was not a reliable indicator of genomic diversity. We argue that genetic or genomic diversity should be an explicit component of conservation status, as such diversity is critical to the long-term sustainability of populations, and anticipate that ROHs will become more commonly used to estimate inbreeding in wild animals.  相似文献   

15.
This study describes a novel approach to identify unique genomic DNA sequences from the unsequenced strain C. jejuni ATCC 43431 by comparison with the sequenced strain C. jejuni NCTC 11168. A shotgun DNA microarray was constructed by arraying 9,600 individual DNA fragments from a C. jejuni ATCC 43431 genomic library onto a glass slide. DNA fragments unique to C. jejuni ATCC 43431 were identified by competitive hybridization to the array with genomic DNA of C. jejuni NCTC 11168. The plasmids containing unique DNA fragments were sequenced, allowing the identification of up to 130 complete and incomplete genes. Potential biological roles were assigned to 66% of the unique open reading frames. The mean G+C content of these unique genes (26%) differs significantly from the G+C content of the entire C. jejuni genome (30.6%). This suggests that they may have been acquired through horizontal gene transfer from an organism with a G+C content lower than that of C. jejuni. Because the two C. jejuni strains differ by Penner serotype, a large proportion of the unique ATCC 43431 genes encode proteins involved in lipooligosaccharide and capsular biosynthesis, as expected. Several unique open reading frames encode enzymes which may contribute to genetic variability, i.e., restriction-modification systems and integrases. Interestingly, many of the unique C. jejuni ATCC 43431 genes show identity with a possible pathogenicity island from Helicobacter hepaticus and components of a potential type IV secretion system. In conclusion, this study provides a valuable resource to further investigate Campylobacter diversity and pathogenesis.  相似文献   

16.
The gram-negative enteric bacterium Proteus mirabilis is a frequent cause of urinary tract infections in individuals with long-term indwelling catheters or with complicated urinary tracts (e.g., due to spinal cord injury or anatomic abnormality). P. mirabilis bacteriuria may lead to acute pyelonephritis, fever, and bacteremia. Most notoriously, this pathogen uses urease to catalyze the formation of kidney and bladder stones or to encrust or obstruct indwelling urinary catheters. Here we report the complete genome sequence of P. mirabilis HI4320, a representative strain cultured in our laboratory from the urine of a nursing home patient with a long-term (> or =30 days) indwelling urinary catheter. The genome is 4.063 Mb long and has a G+C content of 38.88%. There is a single plasmid consisting of 36,289 nucleotides. Annotation of the genome identified 3,685 coding sequences and seven rRNA loci. Analysis of the sequence confirmed the presence of previously identified virulence determinants, as well as a contiguous 54-kb flagellar regulon and 17 types of fimbriae. Genes encoding a potential type III secretion system were identified on a low-G+C-content genomic island containing 24 intact genes that appear to encode all components necessary to assemble a type III secretion system needle complex. In addition, the P. mirabilis HI4320 genome possesses four tandem copies of the zapE metalloprotease gene, genes encoding six putative autotransporters, an extension of the atf fimbrial operon to six genes, including an mrpJ homolog, and genes encoding at least five iron uptake mechanisms, two potential type IV secretion systems, and 16 two-component regulators.  相似文献   

17.
We report the isolation of the complete genes encoding nucleolin from rat and hamster. The DNA clones were obtained from partial genomic libraries by probing with a genomic DNA fragment containing the leader and promoter regions of the mouse nucleolin gene. We have determined the complete nucleotide sequence of the 5'-terminal region for the three rodent species. The sequenced regions extend over 1 kb downstream and upstream from the cap sites and include a conserved CpG island 1500 nucleotides (nt) long. The 5' end of the CpG island in each species has maintained a long alternating purine-pyrimidine sequence which could adopt a Z-DNA conformation. By sequence comparison, 42 blocks of homology are defined in the 5'-terminal region, of which 36 appear in the CpG island and contain numerous conserved CpG dinucleotides. Two blocks, 110 and 49 nt long, encompassing the cap sites and the region immediately upstream, respectively, present features characteristic of regulated genes: a possible TATA box (ATTA), two pyrimidine-rich nucleotide stretches and two inverted juxtaposed CCAAT-like boxes (GGTTGG). Furthermore, the adjacent upstream conserved region presents features characteristic of housekeeping genes: four G/C boxes, embedded in a high G + C-content sequence, among them one presenting a perfect consensus Sp 1-binding site (GCCCCGCCCC). Among unusual features, we report numerous large G + C-rich conserved sequences located in the first intron. One of these sequences contains two G/C boxes which border a sequence presenting a dyad symmetry (GCGCACGTGCTC). Our findings shed some light on the putative role of the CpG island. We show that CpG-rich sequence motifs are under strong selective pressure over the whole 5'-terminal region and are presumably involved in regulatory mechanisms.  相似文献   

18.
Haiminen N  Mannila H 《Gene》2007,394(1-2):53-60
The isochore structure of a genome is observable by variation in the G+C (guanine and cytosine) content within and between the chromosomes. Describing the isochore structure of vertebrate genomes is a challenging task, and many computational methods have been developed and applied to it. Here we apply a well-known least-squares optimal segmentation algorithm to isochore discovery. The algorithm finds the best division of the sequence into k pieces, such that the segments are internally as homogeneous as possible. We show how this simple segmentation method can be applied to isochore discovery using as input the G+C content of sliding windows on the sequence. To evaluate the performance of this segmentation technique on isochore detection, we present results from segmenting previously studied isochore regions of the human genome. Detailed results on the MHC locus, on parts of chromosomes 21 and 22, and on a 100 Mb region from chromosome 1 are similar to previously suggested isochore structures. We also give results on segmenting all 22 autosomal human chromosomes. An advantage of this technique is that oversegmentation of G+C rich regions can generally be avoided. This is because the technique concentrates on greater global, instead of smaller local, differences in the sequence composition. The effect is further emphasized by a log-transformation of the data that lowers the high variance that is observed in G+C rich regions. We conclude that the least-squares optimal segmentation method is computationally efficient and yields results close to previous biologically motivated isochore structures.  相似文献   

19.
DNA mismatch repair and synonymous codon evolution in mammals   总被引:4,自引:3,他引:1  
It has been suggested that the differences in synonymous codon use between mammalian genes within a genome are due to differences in the efficiency of DNA mismatch repair. This hypothesis was tested by developing a model of mismatch repair, which was used to predict the expected relationship between the rate of substitution and G+C content at silent sites. It was found that the silent-substitution rate should decline with increasing G+C content over most of the G+C-content range, if it is assumed that mismatch repair is G+C biased, an assumption which is supported by data. This prediction was then tested on a set of 58 primate and artiodactyl genes. There was no evidence of a direct decline in substitution rate with increasing G+C content, for either twofold- or fourfold-degenerate sites. It was therefore concluded that variation in the efficiency of mismatch repair is not responsible for the differences in synonymous codon use between mammalian genes. In support of this conclusion, analysis of the model also showed that the parameter range over which mismatch repair can explain the differences in synonymous codon use between genes is very small.   相似文献   

20.
The genomes of eukaryotes are mosaics of isochores. These are long DNA stretches that are fairly homogeneous in base composition and that belong to a small number of families characterized by different ratios of GC to AT and different short-sequence patterns (i.e., different DNA structures that interact with different proteins). This genome organization led to two discoveries: (1) the genomic code, which refers to two correlations, that of the composition of coding and contiguous noncoding sequences, and that of coding sequences and the structural properties of the encoded proteins; and (2) the genome phenotypes, which correspond to the patterns of isochore families in the genomes. These patterns indicate that genome evolution may proceed either according to a conservative mode or to a transitional (isochore shifting) mode, apparently depending upon whether the environment is constant or shifting. According to the neoselectionist theory, natural selection is responsible for both modes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号