首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comparative genomics is a superior way to identify phylogenetically conserved features like genes or regions involved in gene regulation. The comparison of extended orthologous chromosomal regions should also reveal other characteristic traits essential for chromosome or gene function. In the present study we have sequenced and compared a region of conserved synteny from human chromosome 11p15.3 and mouse chromosome 7. In human, this region is known to contain several genes involved in the development of various disorders like Beckwith-Wiedemann overgrowth syndrome and other tumor diseases. Furthermore, in the neighboring chromosome region 11p15.5 extensive imprinting of genes has been reported which might extend to region 11p15.3. The analysis of approximately 730 kb in human and 620 kb in mouse led to the identification of eleven genes. All putative genes found in the mouse DNA were also present in the same order and orientation in the human chromosome. However, in the human DNA one putative gene of unknown function could be identified which is not present in the orthologous position of the mouse chromosome. The sequence similarity between human and mouse is higher in transcribed and exon regions than in non-transcribed segments. Dot plot analysis, however, reveals a surprisingly well-conserved sequence similarity over the entire analyzed region. In particular, the positions of CpG islands, short regions of very high GC content in the 5' region of putative genes, are similar in human and mouse. With respect to base composition, two distinct segments of significantly different GC content exist as well in human as in the mouse. With a GC content of 45% the one segment would correspond to "isochore H1" and the other segment (39% GC in human, 40% GC in mouse) to "isochore L1/L2". The gene density (one gene per 66 kb) is slightly higher than the average calculated for the complete human genome (one gene per 90 kb). The comparison of the number and distribution of repetitive elements shows that the proportion of human DNA made up by interspersed repeats (43.8%) is significantly higher than in the corresponding mouse DNA (30.1%). This partly explains why the human DNA is longer between the landmark genes used to define the orthologous positions in human and mouse.  相似文献   

2.
The base compositional correlations that hold among various coding and noncoding regions of the canine genome have been analysed. The distribution pattern of genes, on the basis of GC(3) composition, shows a wide range similar to that observed in human. However the occurrence of maximum number of genes was observed in the range of 65-75% of GC(3) composition. The correlation between the coding DNA sequences of canine with the different noncoding regions (introns and flanking regions) is found to be significant and in many cases the degree of correlation show similarity to human genome. We found that these correlations are not limited to the GC content alone, but is holding at the level of the frequency of individual bases as well. The present study suggests that canines ideally belong to the predicted 'general mammalian pattern' of genome composition along with human beings.  相似文献   

3.
Jabbari K  Bernardi G 《Gene》2000,247(1-2):287-292
In the present work we show that in the Drosophila genome (which covers a 37-51% GC range at a DNA size of approx.50kb) a linear correlation holds between GC (or GC(3)50kb) genomic sequences embedding them. This correlation allows us to position the two compositional distributions of (a) coding sequences, and (b) of long DNA segments relative to each other and to calculate gene concentration across the compositional range of the Drosophila genome. Using this approach, we show that gene concentration increases with increasing GC of the regions embedding the genes, reaching a 7-fold higher level in the GC-richest regions compared with the GC-poorest regions. The gene distribution of the Drosophila genome is, therefore, similar to (although less striking than) that of the human genome, whereas it is very different from those of the Arabidopsis genome, which has about the same size as the Drosophila genome.  相似文献   

4.
Patterns of segmental duplication in the human genome   总被引:12,自引:0,他引:12  
We analyzed the completed human genome for recent segmental duplications (size > or = 1 kb and sequence similarity > or = 90%). We found that approximately 4% of the genome is covered by duplications and that the extent of segmental duplication varies from 1% to 14% among the 24 chromosomes. Intrachromosomal duplication is more frequent than interchromosomal duplication in 15 chromosomes. The duplication frequencies in pericentromeric and subtelomeric regions are greater than the genome average by approximately threefold and fourfold. We examined factors that may affect the frequency of duplication in a region. Within individual chromosomes, the duplication frequency shows little correlation with local gene density, repeat density, recombination rate, and GC content, except chromosomes 7 and Y. For the entire genome, the duplication frequency is correlated with each of the above factors. Based on known genes and Ensembl genes, the proportion of duplications containing complete genes is 3.4% and 10.7%, respectively. The proportion of duplications containing genes is higher in intrachromosomal than in interchromosomal duplications, and duplications containing genes have a higher sequence similarity and tend to be longer than duplications containing no genes. Our simulation suggests that many duplications containing genes have been selectively maintained in the genome.  相似文献   

5.
Genome properties of the diatom Phaeodactylum tricornutum   总被引:1,自引:0,他引:1  
Diatoms are a ubiquitous class of microalgae of extreme importance for global primary productivity and for the biogeochemical cycling of minerals such as silica. However, very little is known about diatom cell biology or about their genome structure. For diatom researchers to take advantage of genomics and post-genomics technologies, it is necessary to establish a model diatom species. Phaeodactylum tricornutum is an obvious candidate because of its ease of culture and because it can be genetically transformed. Therefore, we have examined its genome composition by the generation of approximately 1,000 expressed sequence tags. Although more than 60% of the sequences could not be unequivocally identified by similarity to sequences in the databases, approximately 20% had high similarity with a range of genes defined functionally at the protein level. It is interesting that many of these sequences are more similar to animal rather than plant counterparts. Base composition at each codon position and GC content of the genome were compared with Arabidopsis, maize (Zea mays), and Chlamydomonas reinhardtii. It was found that distribution of GC within the coding sequences is as homogeneous in P. tricornutum as in Arabidopsis, but with a slightly higher GC content. Furthermore, we present evidence that the P. tricornutum genome is likely to be small (less than 20 Mb). Therefore, this combined information supports the development of this species as a model system for molecular-based studies of diatom biology. The nucleotide sequence data reported has been deposited in GenBank Nucleotide Sequence Database (dbEST section) under accession nos. BI306757 through BI307753.  相似文献   

6.
7.
Isochores and tissue-specificity   总被引:15,自引:2,他引:13       下载免费PDF全文
The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues.  相似文献   

8.
Summary We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unity slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.  相似文献   

9.
人类蛋白编码基因局部GC水平相关性分析   总被引:2,自引:0,他引:2  
陈祥贵  胡军  杨潇 《遗传》2008,30(9):1169-1174
GC含量是基因组DNA序列碱基组成的重要特征, 蕴涵基因结构、功能和进化信息。文中通过从公共数据库提取7 992个非冗余的人类蛋白质编码基因DNA序列, 分析了基因序列不同区域的局部GC含量和相关性。结果表明: 基因局部GC含量呈现不均一性, 5′非翻译区GC水平最高, 为62.56%; 而3′非翻译区GC水平最低, 为43.97%。3′侧翼序列的GC含量能较好地代表基因所在区域DNA长片段的GC水平。虽然开放阅读框的GC含量比内含子、3′非翻译区和3′侧翼序列的GC含量高, 但4个区域的GC含量之间均存在较高的相关性。密码子第三位置的平均GC含量(GC3)为58.09%, 显著高于密码子第一位置和第二位置的GC含量, 且与开放阅读框的GC水平高度相关, 相关系数高达0.91。GC3与内含子、3′非翻译区、3′侧翼序列的GC水平相关性也较高, GC3对3′侧翼序列的GC含量的直线回归斜率为1.25。因此, GC3可作为基因所在区域GC水平变化的敏感性指标。而密码子第一位置和第二位置以及5′侧翼序列和5′非翻译区GC水平与基因其他区域的GC水平的相关性较弱。该研究结果提示: 基因蛋白编码区密码子第三位置、内含子、3′非翻译区和3′侧翼序列的碱基可能经历了相近的进化过程, 而蛋白编码区密码子第一位置和第二位置、5′侧翼序列和5′非翻译区由于功能的需要而经历了不同的突变和选择。  相似文献   

10.
《Gene》1996,174(1):95-102
Linear correlations exist between the GC levels of third codon positions (GC3) of individual human genes and the GC levels of long genomic sequences and DNA molecules (50–100 kb in size) embedding the genes. These linear relationships allow the positioning of the GC3 histogram of cDNA sequences from the databases relative to the CsC1 profile of human DNA. In turn, this allows an estimate of the relative concentrations of genes in genomic regions of different GC content. An estimate obtained by using current sequence data and Gaussian decompositions of the GC3 histogram and of the CsC1 profile indicates that the GC-richest (non-ribosomal) component of the human genome is at least 17 times as gene-rich as the GC-poor regions. Moreover, our results suggest that the most recent physical maps of the human genome consisting of overlapping YACs cover less than 50% of the genes.  相似文献   

11.
Mott R  Flint J 《Genetics》2002,160(4):1609-1618
We describe a method to simultaneously detect and fine map quantitative trait loci (QTL) that is especially suited to the mapping of modifier loci in mouse mutant models. The method exploits the high level of historical recombination present in a heterogeneous stock (HS), an outbred population of mice derived from known founder strains. The experimental design is an F(2) cross between the HS and a genetically distinct line, such as one carrying a knockout or transgene. QTL detection is performed by a standard genome scan with approximately 100 markers and fine mapping by typing the same animals using densely spaced markers over those candidate regions detected by the scan. The analysis uses an extension of the dynamic-programming technique employed previously to fine map QTL in HS mice. We show by simulation that a QTL accounting for 5% of the total variance can be detected and fine mapped with >50% probability to within 3 cM by genotyping approximately 1500 animals.  相似文献   

12.
13.
In this report we present the results of the analysis of approximately 2.7 Mb of genomic information for the American mink (Neovison vison) derived through BAC end sequencing. Our study, which encompasses approximately 1/1000th of the mink genome, suggests that simple sequence repeats (SSRs) are less common in the mink than in the human genome, whereas the average GC content of the mink genome is slightly higher than that of its human counterpart. The 2.7 Mb mink genomic dataset also contained 2,416 repeat elements (retroids and DNA transposons) occupying almost 31% of the sequence space. Among repeat elements, LINEs were over-represented and endogenous viruses (aka LTRs) under-represented in comparison to the human genome. Finally, we present a virtual map of the mink genome constructed with reference to the human and canine genome assemblies using a comparative genomics approach and incorporating over 200 mink BESs with unique hits to the human genome.  相似文献   

14.
The honeybee (Apis mellifera) has a genome with a wide variation in GC content showing 2 clear modal GC values, in some ways reminiscent of an isochore-like structure. To gain insight into causes and consequences of this pattern, we used a comparative approach to study the genome-wide alignment of primarily coding sequence of A. mellifera with Drosophila melanogaster and Anopheles gambiae. The latter 2 species show a higher average GC content than A. mellifera and no indications of bimodality, suggesting that the GC-poor mode is a derived condition in honeybee. In A. mellifera, synonymous sites of genes generally adopt the GC content of the region in which they reside. A large proportion of genes in GC-poor regions have not been assigned to the honeybee assembly because of the low sequence complexity of their genome neighborhood. The synonymous substitution rate between A. mellifera and the other species is very close to saturation, but analyses of nonsynonymous substitutions as well as amino acid substitutions indicate that the GC-poor regions are not evolving faster than the GC-rich regions. We describe the codon usage and amino acid usage and show that they are remarkably heterogeneous within the honeybee genome between the 2 different GC regions. Specifically, the genes located in GC-poor regions show a much larger deviation in both codon usage bias and amino acid usage from the Dipterans than the genes located in the GC-rich regions.  相似文献   

15.
Jabbari K  Rayko E  Bernardi G 《Gene》2003,317(1-2):203-208
Since many gene duplications in the human genome are ancient duplications going back to the origin of vertebrates, the question may be asked about the fate of such duplicated genes at the compositional genome transitions that occurred between cold- and warm-blooded vertebrates. Indeed, at that transition, about half of the (GC-poor) genes of cold-blooded vertebrates (the genes of the gene-dense "ancestral genome core") underwent a GC enrichment to become the genes of the "genome core" of warm-blooded vertebrates. Since the compositional distribution of the human duplicated genes investigated (1111 pairs) mimics the general distribution of human genes (about 50% GC(3)-poor and 50% GC(3)-rich genes, the border being at 60% GC(3)), we considered two possibilities, namely that the compositional transition affected either (i) about half of the copies on a random basis, or (ii) preferentially only one copy of the duplicated genes. The two possibilities could be distinguished if each copy is put into one of two subsets according to its GC(3) level. Indeed, in the first case, the two distributions would be similar, whereas in the second case, the two distributions would be different, one copy having maintained the ancestral GC-poor composition, and one copy having undergone the compositional change. Using this approach, we could show that, by far and large, one copy of the duplicated genes preferentially underwent the GC enrichment. This result implies that this copy, which had possibly acquired a different function and/or regulation, was preferentially translocated into the gene-dense compartment of the genome, the "ancestral genome core", namely the "gene space" which underwent the compositional transition at the emergence of warm-blooded vertebrates.  相似文献   

16.
Compositional evolution of noncoding DNA in the human and chimpanzee genomes   总被引:11,自引:0,他引:11  
We have examined the compositional evolution of noncoding DNA in the primate genome by comparison of lineage-specific substitutions observed in 1.8 Mb of genomic alignments of human, chimpanzee, and baboon with 6542 human single-nucleotide polymorphisms (SNPs) rooted using chimpanzee sequence. The pattern of compositional evolution, measured in terms of the numbers of GC-->AT and AT-->GC changes, differs significantly between fixed and polymorphic sites, and indicates that there is a bias toward fixation of AT-->GC mutations, which could result from weak directional selection or biased gene conversion in favor of high GC content. Comparison of the frequency distributions of a subset of the SNPs revealed no significant difference between GC-->AT and AT-->GC polymorphisms, although AT-->GC polymorphisms in regions of high GC segregate at slightly higher frequencies on average than GC-->AT polymorphisms, which is consistent with a fixation bias favoring high GC in these regions. However, the substitution data suggest that this fixation bias is relatively weak, because the compositional structure of the human and chimpanzee genomes is becoming homogenized, with regions of high GC decreasing in GC content and regions of low GC increasing in GC content. The rate and pattern of nucleotide substitution in 333 Alu repeats within the human-chimpanzee-baboon alignments are not significantly affected by the GC content of the region in which they are inserted, providing further evidence that, since the time of the human-chimpanzee ancestor, there has been little or no regional variation in mutation bias.  相似文献   

17.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozygosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

18.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozy-gosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

19.
A human-specific subfamily of Alu sequences.   总被引:22,自引:0,他引:22  
Of a total of 500,000 Alu family members, approximately 500 are present as a human-specific (HS) subfamily. Each of the HS subfamily members shares a high degree of nucleotide identity and is not present at orthologous positions in other primate genomes, suggesting that HS subfamily members have recently inserted within the human genome. This confirms the hypothesis that the majority of Alu family members are amplified copies of a "master" gene(s). This master gene appears to be amplifying at a rate much slower than that seen earlier in primate evolution. Some of the HS Alu subfamily members have amplified so recently that they are dimorphic in the human population, making them a potentially powerful tool for studies of human populations.  相似文献   

20.
The pseudoautosomal regions represent blocks of sequence identity between the mammalian sex chromosomes. In humans, they reside at the ends of the X and Y chromosomes and encompass roughly 2.7 Mb (PAR1) and 0.33 Mb (PAR2). As a major asset of recently available sequence data, our view of their structural characteristics could be refined considerably. While PAR2 resembles the overall sequence composition of the X chromosome and exhibits only slightly elevated recombination rates, PAR1 is characterized by a significantly higher GC content and a completely different repeat structure. In addition, it exhibits one of the highest recombination frequencies throughout the entire human genome and, probably as a consequence of its structural features, displays a significantly faster rate of evolution. It therefore represents an exceptional model to explore the correlation between meiotic recombination and evolutionary forces such as gene mutation and conversion. At least twenty-nine genes lie within the human pseudoautosomal regions, and these genes exhibit 'autosomal' rather than sex-specific inheritance. All genes within PAR1 escape X inactivation and are therefore candidates for the etiology of haploinsufficiency disorders including Turner syndrome (45,X). However, the only known disease gene within the pseudoautosomal regions is the SHORT STATURE HOMEBOX (SHOX) gene, functional loss of which is causally related to various short stature conditions and disturbed bone development. Recent analyses have furthermore revealed that the phosphorylation-sensitive function of SHOX is directly involved in chondrocyte differentiation and maturation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号