首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
真核生物DNA非编码区的组分分析   总被引:4,自引:0,他引:4  
在全基因组水平上,用直方图、混沌表示灰度图、距离差异度和信息熵差异度四种方法,研究了拟南芥、线虫、果蝇的DNA内含子、基因间隔区DNA、外显子三种区域的核苷酸短序列组分及组分复杂度.结果表明:a.不同基因组之间,不管基因数目多少,用4种方法得到的外显子部分其组分复杂度都比较接近,而非编码区部分的组分复杂度却很大.这一点定量地说明了物种之间的复杂程度,主要不体现在编码区部分,而体现在非编码区部分.b.同一基因组中,内含子的核苷酸短序列组分复杂度都是相似的,外显子和intergenic DNA部分的组分复杂度也是相似的.c.内含子和intergenic DNA在转录、剪切、二级结构等方面有很大的不同,但它们在核苷酸短序列组分上的差异却很小,说明内含子和intergenic DNA在转录、剪切、二级结构上的不同并不通过核苷酸短序列组分来进行限制.  相似文献   

2.
Fortes GG  Bouza C  Martínez P  Sánchez L 《Genetica》2007,129(3):281-289
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.  相似文献   

3.
Our previous work applied neural network techniques to the problem of discriminating open reading frame (ORF) sequences taken from introns versus exons. The method counted the codon frequencies in an ORF of a specified length, and then used this codon frequency representation of DNA fragments to train a neural net (essentially a Perceptron with a sigmoidal, or "soft step function", output) to perform this discrimination. After training, the network was then applied to a disjoint "predict" set of data to assess accuracy. The resulting accuracy in our previous work was 98.4%, exceeding accuracies reported in the literature at that time for other algorithms. Here, we report even higher accuracies stemming from calculations of mutual information (a correlation measure) of spatially separated codons in exons, and in introns. Significant mutual information exists in exons, but not in introns, between adjacent codons. This suggests that dicodon frequencies of adjacent codons are important for intron/exon discrimination. We report that accuracies obtained using a neural net trained on the frequency of dicodons is significantly higher at smaller fragment lengths than even our original results using codon frequencies, which were already higher than simple statistical methods that also used codon frequencies. We also report accuracies obtained from including codon and dicodon statistics in all six reading frames, i.e. the three frames on the original and complement strand. Inclusion of six-frame statistics increases the accuracy still further. We also compare these neural net results to a Bayesian statistical prediction method that assumes independent codon frequencies in each position. The performance of the Bayesian scheme is poorer than any of the neural based schemes, however many methods reported in the literature either explicitly, or implicitly, use this method. Specifically, Bayesian prediction schemes based on codon frequencies achieve 90.9% accuracy on 90 codon ORFs, while our best neural net scheme reaches 99.4% accuracy on 60 codon ORFs. "Accuracy" is defined as the average of the exon and intron sensitivities. Achievement of sufficiently high accuracies on short fragment lengths can be useful in providing a computational means of finding coding regions in unannotated DNA sequences such as those arising from the mega-base sequencing efforts of the Human Genome Project. We caution that the high accuracies reported here do not represent a complete solution to the problem of identifying exons in "raw" base sequences. The accuracies are considerably lower from exons of small length, although still higher than accuracies reported in the literature for other methods. Short exon lengths are not uncommon.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

4.
The compositional properties of human genes   总被引:8,自引:0,他引:8  
Summary The present work represents the first attempt to study in greater detail previously proposed compositional correlations in genomes, based on a body of additional data relating to gene localizations as well as to extended flanking sequences extracted from gene banks. We have investigated the correlations that exist between (1) the GC levels of exons of human genes, and (2) the GC levels of either intergenic sequences or introns associated with the genes under consideration. In both cases, linear relationships with slopes close to unity were found. The similarity of the linear relationships indicates similar GC levels in intergenic sequences and introns located in the same isochores. Moreover, both intergenic sequences and introns showed GC levels 5–10% lower than the corresponding exons. The above findings considerably strengthen the previously drawn conclusion that coding and noncoding sequences (both inter- and intragenic) from the same isochores of the human genome are compositionally correlated. In addition, we find linear correlations between the GC levels of codon positions and of the intergenic sequences or introns associated with the corresponding genes, as well as among the GC levels of codon positions of genes.  相似文献   

5.
Analysis of an artificial neural network trained to classify DNA as coding or non-coding revealed compositional differences between sequence parts translated into protein and those that were not. The 5' end of human introns was found to have a base composition that was non-random to an extent matching the non-randomness in the 3' end that contains the polypyrimidine tract. The prevailing nucleotides in the initial 50 nucleotides of human introns are guanine and cytosine, the trinucleotide GGG was found to occur almost four times as frequently as it would in sequences with a uniform distribution of the nucleotides. The initial part of terminal exons and their associated terminal introns were shown to have a very special base composition deviating strongly from the normal picture in other exons and introns.  相似文献   

6.
7.
8.
Analysis of DNA sequences of 132 introns and 140 exons from 42 pairs of orthologous genes of mouse and rat was used to compare patterns of evolutionary change between introns and exons. The mean of the absolute difference in length (measured in base pairs) between the two species was nearly five times as high in the case of introns as in the case of exons. The average rate of nucleotide substitution in introns was very similar to the rate of synonymous substitution in exons, and both were about three times the rate of substitution at nonsynonymous sites in exons. G+C content of introns and exons of the same gene were correlated; but mean G+C content at the third positions of exons was significantly higher than that of introns or positions 1–2 of exons from the same gene. G+C content was conserved over evolutionary time, as indicated by strong correlations between mouse and rat; but the change in G+C content was greatest at position 3 of exons, intermediate in introns, and lowest at positions 1–2 in introns. Received: 23 December 1996 / Accepted: 1 April 1997  相似文献   

9.
We report here results which indicate (i) that the nuclear genomes of angiosperms is characterized by a compositional compartmentalization and an isochore structure; and (ii) that the nuclear genomes of some Gramineae exhibit strikingly different compositional patterns compared to those of many dicots. Indeed, the compositional distribution of nuclear DNA molecules (in the 50-100 Kb size range) from three dicots (pea, sunflower and tobacco) and three monocots (maize, rice and wheat) were found to be centered around lower (41%) and higher (45% for rice, 48% for maize and wheat) GC levels, respectively (and to trail towards even higher GC values in maize and wheat). Experiments on gene localization in density gradient fractions showed a remarkable compositional homogeneity in vast (greater than 100-200 Kb) regions surrounding the genes. On the other hand, the compositional distribution of coding sequences (GenBank and literature data) from dicots (several orders) was found to be narrow, symmetrical and centered around 46% GC, that from monocots (essentially barley, maize and wheat) to be broad, asymmetrical and characterized by an upward trend towards high GC values, with the majority of sequences between 60 and 70% GC. Introns exhibited a similar compositional distribution, but lower GC levels, compared to exons from the same genes.  相似文献   

10.
The nucleotide sequence of 6225 base pairs (bp) of Euglena gracilis chloroplast DNA including the complete DNA sequence of the chloroplast-encoded ribulose-1,5-bisphosphate carboxylase large subunit gene along with the flanking DNA sequences is presented. The gene is greater than 5.5 kilobase pairs in length and is organized as 10 exons coding for 475 amino acids, separated by 9 introns. The exons range in size from 45 to 438 bp, while the introns range in size from 382 to 568 bp. The introns have highly conserved boundary sequences with the consensus, 5'-N GTGTGGATTT...(intron)...TTAATTTTAT N-3'. The introns are 82-85 mol% AT, with a pronounced T greater than A greater than G greater than C base bias in the RNA-like strand. They do not appear to encode any polypeptides. In addition, the introns have a conserved sequence 30-50 bp from their 3'-ends with the consensus, 5'-TACAGTTTGAAAATGA-3'. The 5'-TACA sequence bears some homology to the 5'-end of the TACTAACA sequence found in a similar location in yeast nuclear mRNA introns. The conserved sequences of the Euglena rbcL introns may be indicative of a splicing mechanism similar to that of eucaryotic nuclear mRNA introns and group II mitochondrial introns.  相似文献   

11.
A low level of genetic variation has limited the application of molecular markers for characterizing important traits in cultivated tomato. To detect polymorphisms in tomato conserved ortholog sets (COS), expressed sequence tags (ESTs) were searched against tomato and Arabidopsis genomic sequences to define the positions of introns. Introns were amplified from 12 different accessions of tomato by polymerase chain reaction and nucleotide sequences were determined by sequencing. Results indicated that there was a possibility of 71% to amplify introns from tomato genomic DNA through this approach. A total of 201 introns were sequenced from 86 COS unigenes. The intron positions and numbers were conserved between tomato and Arabidopsis, but average intron length was three times longer in tomato than in Arabidopsis. A total of 307 single nucleotide polymorphisms (SNPs) and 75 indels were detected in introns of 57 COS unigenes among 12 tomato lines. Within cultivated tomato germplasm 172 SNPs and 47 indels were detected in introns of 33 COS unigenes. In addition, 41 SNPs were identified in the exons of 27 COS unigenes. The frequency of SNPs was 2.4 times higher in introns than in exons in the 22 COS unigenes having both intronic and exonic polymorphisms. These results indicate that intronic regions may contain sufficient variation to develop sufficient marker resources for genome-wide analysis in cultivated tomato.  相似文献   

12.
A recombinant phage, SpC3, containing a 17 kb genomic DNA insert representing approximately 60% of the 3' portion of the sheep collagen alpha 2 gene, was evaluated by electron microscopic R loop analysis. A minimum of 17 intervening sequences (introns) and 18 alpha 2 coding sequences (exons) were mapped. With the exception of the 850 base pair exon located at the extreme 3' end of the insert, all exons contained 250 base pairs or less. The total length of all the exons in SpC3 was 3,014 base pairs. The length distribution of the 17 introns ranged from 300 to 1600 base pairs; together, all of the introns comprised 14,070 base pairs of SpC3 DNA. Thus, the DNA region required for coding the interspersed 3 kb of alpha 2 collagen genetic information was 5.6 fold longer than the corresponding alpha 2 mRNA coding sequences.  相似文献   

13.
We previously observed that Antarctic fish genes contain intron sequences of high A+T content (60-70% average A+T) which are in stark contrast with adjacent protein coding-sequences. Here, we report that this disparity in intron/exon base composition is a common feature among teleosts. We analyzed 483 teleost genomic DNA sequences, containing 2583 introns, from 80 teleost genera that populate polar, temperate, or tropical habitats. Eighty-nine percent of teleost introns display an A+T content between 50-84% A+T with a mean of 60% A+T. In contrast, only 37% of teleost exons have an A+T content greater-than 50% with a mean of 48% A+T. A comparison to homologous mammalian genes showed a striking difference; in this case, introns and exons have similar base compositions, averaging 45-47% A+T. This indicates that most teleost genes exhibit a large difference in base composition between their introns and exons. There was no correlation of teleost intron A+T content to intron length or habitat temperature range. Thus, teleost intron sequences tend to show the common feature of being much higher in A+T content then neighboring exons.  相似文献   

14.
Revisiting the problem of intron-exon identification, we use a principal component analysis (PCA) to classify DNA sequences and present first results that validate our approach. Sequences are translated into document vectors that represent their word content; a principal component analysis then defines Gaussian-distributed sequence classes. The classification uses word content and variation of word usage to distinguish sequences. We test our approach with several data sets of genomic DNA and are able to classify introns and exons with an accuracy of up to 96%. We compare the method with the best traditional coding measure, the non-overlapping hexamer frequency count, and find that the PCA method produces better results. We also investigate the degree of cross-validation between different data sets of introns and exons and find evidence that the quality of a data set can be detected.  相似文献   

15.
In this work, we investigated (1) the compositional distributions of all available nuclear coding sequences (and of their three codon positions) of six dicots and four Gramineae; this considerably expanded our knowledge about the differences previously seen between these two groups of plants; (2) the compositional correlations of homologous genes from dicots and from Gramineae, as well as from both groups; all correlations were characterized by very good coefficients, with slopes close to unity in the former two cases and very high in the last; (3) the compositional transition that accompanied the emergence of Gramineae from an ancestral monocot; (4) the compositional correlations between exons and introns, which were very good in Gramineae, but only poor to good in dicots; and (5) the compositional profiles of homologous genes from angiosperms, which were characterized by a series of peaks (exons) and valleys (introns) separated by 15–20% GC. The conservative and transitional modes of compositional evolution in plant genes and their general implications are discussed. Received: 24 June 1997 / Accepted: 20 August 1997  相似文献   

16.
The DNA sequence composition of 526 dicot and 345 monocot intron sequences have been characterized using computational methods. Splice site information content and bulk intron and exon dinucleotide composition were determined. Positions 4 and 5 of 5 splice sites contain different statistically significant levels of information in the two groups. Basal levels of information in introns are higher in dicots than in monocots. Two dinucleotide groups, WW (AA, AU, UA, UU) and SS (CC, CG, GC, GG) have significantly different frequencies in exons and introns of the two plant groups. These results suggest that the mechanisms of splice-site recognition and binding may differ between dicot and monocot plants.  相似文献   

17.
张姝  张永杰 《微生物学通报》2015,42(8):1549-1560
【目的】分析3个细胞核蛋白编码基因(csp1、MAT1-1-1和MAT1-2-1)在不同冬虫夏草菌株间的分子进化。【方法】从125个冬虫夏草样品中分别扩增csp1、MAT1-1-1和MAT1-2-1基因序列,比较外显子和内含子间以及2个交配型基因间的序列变异程度,比较基于不同基因或基因区域所构建的系统发育树拓扑结构的差异,分析3个基因承受的选择压力和DNA重组情况。【结果】3个蛋白编码基因外显子区的长度在不同菌株间高度保守,具有4.5%?5.7%的变异位点;内含子区的长度在不同菌株间相同或不同,具有1.8%?22%的变异位点。对于2个交配型基因,MAT1-1-1的碱基变异率低于MAT1-2-1。基于外显子与内含子构建的系统发育树的拓扑结构,以及基于2个交配型基因外显子构建的系统发育树的拓扑结构都存在明显差异。3个蛋白编码基因都经历着净化选择作用。基因内部的不同DNA位点间有重组,但3个基因片段之间没有明显的重组发生。【结论】由于冬虫夏草菌不同基因以及基因的不同区域表现出进化上的差异,所以在开展冬虫夏草菌进化相关的研究时,应该联合使用多个不同的基因片段。  相似文献   

18.
19.
The role of spliceosomal intronic structures played in evolution has only begun to be elucidated. Comparative genomic analyses of fungal snoRNA sequences, which are often contained within introns and/or exons, revealed that about one-third of snoRNA-associated introns in three major snoRNA gene clusters manifested polymorphisms, likely resulting from intron loss and gain events during fungi evolution. Genomic deletions can clearly be observed as one mechanism underlying intron and exon loss, as well as generation of complex introns where several introns lie in juxtaposition without intercalating exons. Strikingly, by tracking conserved snoRNAs in introns, we found that some introns had moved from one position to another by excision from donor sites and insertion into target sties elsewhere in the genome without needing transposon structures. This study revealed the origin of many newly gained introns. Moreover, our analyses suggested that intron-containing sequences were more prone to sustainable structural changes than DNA sequences without introns due to intron''s ability to jump within the genome via unknown mechanisms. We propose that splicing-related structural features of introns serve as an additional motor to propel evolution.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号