首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Despite constant improvement in prediction accuracy, gene-finding programs are still unable to provide automatic gene discovery with the desired correctness. This paper presents an analysis of gene and intergenic sequences from the point of view of language analysis, where gene and intergenic regions are regarded as two different subjects written in the four-letter alphabet {A,C,G,T}, and high frequency simple sequences are taken as keywords. A measurement alpha(l(tau)) was introduced to describe the relative repeat ratio of simple sequences. Threshold values were found for keyword selections. After eliminating 'noise', 178 short sequences were selected as keywords. DNA sequences are mapped to 178-dimensional Euclidean space, and SVM was used for prediction of gene regions. We showed by cross-validation that the program we developed could predict 93% of gene sequences with 7% false positives. When tested on a long genomic multi-gene sequence, our method improved nucleotide level specificity by 21%, and over 60% of predicted genes corresponded to actual genes.  相似文献   

2.
3.
Studies on the beta-globin gene complex in the mouse have demonstrated the existence of repeated DNA sequences interspersed throughout the intergenic regions (1,2). These sequences are members of families of middle repetitive sequences and have been mapped to specific intergenic sites in the 60 kbp beta-globin complex. In this study we present evidence that members of this middle repetitive family of DNA sequences, the L1Md family, are interspersed throughout the mouse albumin and alpha-fetoprotein gene complex. Unlike those of the beta-globin complex, all of which are found in the intergenic regions, these sequences are localized within intron 12 of the albumin gene and intron 3 of the AFP gene as well as twice in the 13.5 kbp intergenic region that links the albumin gene to the AFP gene.  相似文献   

4.
5.
6.
Sequencing of the Saccharomyces cerevisiae nuclear and mitochondrial genomes provided a new background for studies on the evolution of the genomes. In this study, mitochondrial genomes of a number of Saccharomyces yeasts were mapped by restriction enzyme analysis, the orders of the genes were determined, and two of the genes were sequenced. The genome organization, i.e., the size, presence of intergenic sequences, and gene order, as well as polymorphism within the coding regions, indicate that Saccharomyces mtDNA molecules are dynamic structures and have undergone numerous changes during their evolution. Since the separation and sexual isolation of different yeast lineages, the coding parts have been accumulating point mutations, presumably in a linear manner with the passage of time. However, the accumulation of other changes may not have been a simple function of time. Larger mtDNA molecules belonging to Saccharomyces sensu stricto yeasts have acquired extensive intergenic sequences, including guanosine-cytosine-rich clusters, and apparently have rearranged the gene order at higher rates than smaller mtDNAs belonging to the Saccharomyces sensu lato yeasts. While within the sensu stricto group transposition has been a predominant mechanism for the creation of novel gene orders, the sensu lato yeasts could have used both transposition- and inversion-based mechanisms.  相似文献   

7.
8.
Intergenic regions of the Dictyostelium genome contain an extremely high proportion of AT base pairs. Those intergenic regions which have been subjected to nucleotide sequence analysis are predominantly composed of alternating runs of poly(dA) and poly(dT) and there is evidence to suggest that nucleosomes do not form on such sequences. We have identified two nuclear proteins, of molecular weight 70,000 and 74,000 daltons, which bind only to intergenic regions of a cloned Dictyostelium gene. Binding is specifically inhibited in the presence of synthetic poly(dA) - poly (dT) as competitor. These proteins may play some role in the chromosomal organization of intergenic regions in Dictyostelium discoideum.  相似文献   

9.
Four different intergenic regions of mitochondrial DNA (mt-IGS), a fragment of the intergenic spacer (IGS) region of the rDNA (rDNA-IGS), and a fragment of the ras-related protein (Ypt1) gene were amplified and sequenced from a panel of 31 Phytophthora species representing the most significant forest pathogens and the breadth of diversity in the genus. Over 80 kbp of novel sequences were generated and alignments showed very variable (introns and non-coding regions) as well as conserved coding regions. The mitochondrial DNA regions had an AT/GC ratio ranging from 67.2 to 89.0% and were appropriate for diagnostic development and phylogeographic analysis. The IGS fragment was less variable but still appropriate to discriminate amongst some important forest pathogens. The introns of the Ypt1 gene were sufficiently polymorphic for the development of molecular markers for almost all Phytophthora species, with more conserved flanking coding regions appropriate for the design of Phytophthora genus-specific primers. In general, phylogenetic analysis of the sequence alignments grouped species in clades that matched those based on the ITS regions of the rDNA. In many cases the resolution was improved over ITS but in other cases sequences were too variable to align accurately and yielded phylograms inconsistent with other data. Key studies on the intraspecific variation and primer specificity remain. However the research has already yielded an enormous dataset for the identification, detection and study of the molecular evolution of Phytophthora species.  相似文献   

10.
11.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.  相似文献   

12.
13.
14.
15.
To study genome evolution and diversity in barley (Hordeum vulgare), we have sequenced and compared more than 300 kb of sequence spanning the Rph7 leaf rust disease resistance gene in two barley cultivars. Colinearity was restricted to five genic and two intergenic regions representing <35% of the two sequences. In each interval separating the seven conserved regions, the number and type of repetitive elements were completely different between the two homologous sequences, and a single gene was absent in one cultivar. In both cultivars, the nonconserved regions consisted of approximately 53% repetitive sequences mainly represented by long-terminal repeat retrotransposons that have inserted <1 million years ago. PCR-based analysis of intergenic regions at the Rph7 locus and at three other independent loci in 41 H. vulgare lines indicated large haplotype variability in the cultivated barley gene pool. Together, our data indicate rapid and recent divergence at homologous loci in the genome of H. vulgare, possibly providing the molecular mechanism for the generation of high diversity in the barley gene pool. Finally, comparative analysis of the gene composition in barley, wheat (Triticum aestivum), rice (Oryza sativa), and sorghum (Sorghum bicolor) suggested massive gene movements at the Rph7 locus in the Triticeae lineage.  相似文献   

16.
The aquatic larvae of the genus Chironomus (Diptera, Insecta) contain at least 12 different hemoglobin (Hb) variants in their hemolymph. In the present study we have analysed the structure and part of the nucleotide sequence of a Hb gene cluster cloned from the genomic DNA of Chironomus thummi piger. The cluster contains probably 6 different genes, separated by intergenic regions of various lengths. The nucleotide sequence of three putative Hb genes including the intergenic regions is presented. The inferred amino-acid sequences show clearly that two of these putative genes code for subvariants of the Hb variant VIIB. The third gene codes for a so far unknown Hb protein. As known already for other chironomid Hb genes, there are no intron sequences present in the coding regions.  相似文献   

17.
The nucleotide sequence of the beta globin gene cluster of the prosimian Galago crassicaudatus has been determined. A total sequence spanning 41,101 bp contains and links together previously published sequences of the five galago beta-like globin genes (5'-epsilon-gamma-psi eta-delta-beta-3'). A computer-aided search for middle interspersed repetitive sequences identified 10 LINE (L1) elements, including a 5' truncated repeat that is orthologous to the full-length L1 element found in the human epsilon-gamma intergenic region. SINE elements that were identified included one Alu type I repeat, four Alu type II repeats, and two methionine tRNA-derived Monomer (type III) elements. Alu type II and Monomer sequences are unique to the galago genome. Structural analyses of the cluster sequence reveals that it is relatively A+T rich (about 62%) and regions with high G+C content are associated primarily with globin coding regions. Comparative analyses with the beta globin cluster sequences of human, rabbit, and mouse reveal extensive sequence homologies in their genic regions, but only human, galago, and rabbit sequences share extensive intergenic sequence homologies. Divergence analyses of aligned intergenic and flanking sequences from orthologous human, galago, and rabbit sequences show a gradation in the rate of nucleotide sequence evolution along the cluster where sequences 5' of the epsilon globin gene region show the least sequence divergence and sequences just 5' of the beta globin gene region show the greatest sequence divergence.  相似文献   

18.
19.
Ometto L  Stephan W  De Lorenzo D 《Genetics》2005,169(3):1521-1527
Our study of nucleotide sequence and insertion/deletion polymorphism in Drosophila melanogaster noncoding DNA provides evidence for selective pressures in both intergenic regions and introns (of the large size class). Intronic and intergenic sequences show a similar polymorphic deletion bias. Insertions have smaller sizes and higher frequencies than deletions, supporting the hypothesis that insertions are selected to compensate for the loss of DNA caused by deletion bias. Analysis of a simple model of selective constraints suggests that the blocks of functional elements located in intergenic sequences are on average larger than those in introns, while the length distribution of relatively unconstrained sequences interspaced between these blocks is similar in intronic and intergenic regions.  相似文献   

20.
A whole genome contains not only coding regions, but also non-coding regions. These are located between the end of a given coding region and the beginning of the following coding region. For this reason, the information about gene regulation process underlies in intergenic regions. There is no easy way to obtain intergenic regions from current available databases. IntergenicDB was developed to integrate data of intergenic regions and their gene related information from NCBI databases. The main goal of INTERGENICDB is to offer friendly database for intergenic sequences of bacterial genomes.

Availability

http://intergenicdb.bioinfoucs.com/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号