首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

2.
3.
4.
5.
6.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the GenBank database, 14766 sequences with a periodicity of two nucleotides have been found. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

7.
8.
The 3-base periodicity, identified as a pronounced peak at the frequency N/3 (N is the length of the DNA sequence) of the Fourier power spectrum of protein coding regions, is used as a marker in gene-finding algorithms to distinguish protein coding regions (exons) and noncoding regions (introns) of genomes. In this paper, we reveal the explanation of this phenomenon which results from a nonuniform distribution of nucleotides in the three coding positions. There is a linear correlation between the nucleotide distributions in the three codon positions and the power spectrum at the frequency N/3. Furthermore, this study indicates the relationship between the length of a DNA sequence and the variance of nucleotide distributions and the average Fourier power spectrum, which is the noise signal in gene-finding methods. The results presented in this paper provide an efficient way to compute the Fourier power spectrum at N/3 and the noise signal in gene-finding methods by calculating the nucleotide distributions in the three codon positions.  相似文献   

9.
The information decomposition (ID) method has been used for searching dinucleotide periodicities, including latent ones, in plant genomes. In nucleotide sequences of genomes of various plants from the Gen-Bank database, 14 766 sequences with a periodicity of two nucleotides have been found at a high level of statistical significance. Classification of the periodicity matrices of the detected DNA sequences has yielded 141 classes of dinucleotide periodicity. Since ID does not detect periodicities with nucleotide deletions or insertions, modified profile analysis (MPA) has been applied to the obtained classes to reveal DNA sequences with dinucleotide periodicities containing nucleotide deletions and insertions. Combined use of ID and MPA has permitted the detection of 80 396 DNA sequences with dinucleotide periodicities in the genomes of various plants. The biological role of dinucleotide periodicity in the detected sequences is discussed.  相似文献   

10.
Isolation and structure of a rhodopsin gene from D. melanogaster   总被引:45,自引:0,他引:45  
C S Zuker  A F Cowman  G M Rubin 《Cell》1985,40(4):851-858
Using a novel method for detecting cross-homologous nucleic acid sequences we have isolated the gene coding for the major rhodopsin of Drosophila melanogaster and mapped it to chromosomal region 92B8-11. Comparison of cDNA and genomic DNA sequences indicates that the gene is divided into five exons. The amino acid sequence deduced from the nucleotide sequence is 373 residues long, and the polypeptide chain contains seven hydrophobic segments that appear to correspond to the seven transmembrane segments characteristic of other rhodopsins. Three regions of Drosophila rhodopsin are highly conserved with the corresponding domains of bovine rhodopsin, suggesting an important role for these polypeptide regions.  相似文献   

11.
Intra- and intermolecular complementary contacts in RNA are not always perfect: a significant amount of mismatch pairs is frequently found in naturally occurring RNA helices. The state of art in studies on mismatch pairs and examples of imperfect complementarity are reviewed. Two more cases are revealed by nucleotide sequence analysis techniques: imperfect complementary contacts Between ends of intervening sequences in eukaryotic mRNA precursors, and possible “stickiness” of mRNA to the ribosomes. The “stickiness” might arise from specific 3-Base periodicity of protein coding sequences which is found to be as universal as the code itself. The imperfect complementary contacts between mRNA and rRNA which monitor the coding frame provide a structural basis for the explanation of leaky frameshift phenomena.  相似文献   

12.
Prokaryotic sequences are responsible for more than just protein coding. There are two 10- to 11-base periodical patterns superimposed on the protein coding message within the same sequence. Positional auto- and cross-correlation analysis of the sequences shows that these two patterns are a short-range counter-phase oscillation of AA and TT dinucleotides and a medium-range in-phase oscillation of the same dinucleotides, spanning distances of up to ∼30 and ∼100 bases, respectively. The short-range oscillation is encoded by the amino acid sequences themselves, apparently, due to the presence of amphipathic α-helices in the proteins. The medium-range oscillation, related to DNA folding in the cell, is created largely by a special choice of the bases in the third positions of the codons. Interestingly, the amino acid sequences do contribute to that signal as well. That is, the very amino acid sequences are, to some extent, degenerate to serve the same oscillating pattern that is associated with the degenerate third codon positions. [Reviewing Editor: Dr. Richard Kliman]  相似文献   

13.
The identification of gene coding regions of DNA sequences through digital signal processing techniques based on the so-called 3-base periodicity has been an emerging problem in bioinformatics. The signal to noise ratio (SNR) of a DNA sequence is computed after mapping the DNA symbolic sequence into numerical sequences. Typical mapping schemes include the Voss, Z-curve and tetrahedron representations and the like, which have been used to construct gene coding region detecting algorithms. In this paper, an extended definition of SNR is proposed, which has less computational cost and wider applicability than its original ones. Furthermore, we analyze the SNRs of different mapping schemes and derive the general relationship between Voss based SNR and that of its general affine transformations. We conclude that the SNRs of Z-curve and tetrahedron map are also linearly proportional to that of Voss map. Not only is our conclusion instructional for the design of other affine transformations, but it is also of much significance in understanding the role of the symbolic-to-numerical mapping in the detection of gene coding regions.  相似文献   

14.
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method.  相似文献   

15.
The nucleotide sequences of two DNA segments from Pseudomonas sp. strain CBS3 that code for two different haloalkanoic acid halidohydrolases were determined. Two open reading frames with coding capacities of 227 amino acids (corresponding to a molecular mass of 25,401 Da) and 229 amino acids (corresponding to a molecular mass of 25,683 Da) were identified as structural genes of 2-haloalkanoic acid dehalogenases I (dehCI) and II (dehCII) by comparison with the N-terminal amino acid sequences of these enzymes. Comparison of the two sequences revealed 45% homology on the DNA level and 37.5% homology on the amino acid level. No homology with other known protein or nucleotide sequences was found.  相似文献   

16.
Complementary and genomic DNA clones corresponding to the human serum amyloid P component (SAP) mRNA have been isolated and analyzed. The nucleotide sequences of the cDNA and the corresponding regions of the genomic SAP DNA reported here were identical, and revealed that after coding for a signal peptide of 19 amino acids and the first two amino acids of the mature SAP protein, there is one small intron of 115-base pairs (bp), followed by a nucleotide sequence coding for the remaining 202 amino acid residues. The SAP gene has an ATATAAA sequence 29-bp upstream from the cap site, but there is no CAAT box-like sequence. A possible polyadenylation signal sequence, ATTAAA, was found to be located 28-bp upstream from the polyadenylation site. A comparison of the genomic SAP DNA sequence with that of human C-reactive protein (CRP) revealed a striking overall homology which was not uniform: several highly conserved regions were bounded by non-homologous regions. This comparison provides further support for the hypothesis that SAP and CRP are products of a gene duplication event.  相似文献   

17.
Cloning and structural analysis of DNA encoding an A2B1a subunit of glycinin   总被引:10,自引:0,他引:10  
The partial DNA sequence of a glycinin gene in a genomic clone and a homologous cDNA clone were determined. They have nearly identical nucleotide sequences and encode the basic polypeptide and part of the acidic polypeptide for an A2B1a glycinin subunit. The protein primary structure deduced from the DNA sequence is in close agreement with the amino acid sequence of the subunit determined chemically and confirms assignment of part of the amino acid sequence in the basic component where we were able to establish an overlap using conventional approaches. The coding part of the basic subunit is interrupted by a 625-base pair A + T-rich intron whose boundaries correlate with the established consensus sequences for the exon-intron junctions. Comparison of the nucleotide sequence of the basic subunit of pea legumin gene with that of the gene for A2B1a subunit reveals 70% homology in coding regions, although there is considerably less in the 3'-flanking regions.  相似文献   

18.
Localizing triplet periodicity in DNA and cDNA sequences   总被引:1,自引:0,他引:1  

Background  

The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.  相似文献   

19.
Accurate identification of protein-coding regions (exons) in DNA sequences has been a challenging task in bioinformatics. Particularly the coding regions have a 3-base periodicity, which forms the basis of all exon identification methods. Many signal processing tools and techniques have been applied successfully for the identification task but still improvement in this direction is needed. In this paper, we have introduced a new promising model-independent time-frequency filtering technique based on S-transform for accurate identification of the coding regions. The S-transform is a powerful linear time-frequency representation useful for filtering in time-frequency domain. The potential of the proposed technique has been assessed through simulation study and the results obtained have been compared with the existing methods using standard datasets. The comparative study demonstrates that the proposed method outperforms its counterparts in identifying the coding regions.  相似文献   

20.
Evolution of the fibronectin gene. Exon structure of cell attachment domain   总被引:6,自引:0,他引:6  
Genomic DNA coding for human fibronectin was identified from a human genomic library by screening with a cDNA clone that specifies the cell attachment domain in human fibronectin. Two clones which together provided more than 22 kilobase pairs of the fibronectin gene were isolated. The exons in this region correspond to approximately 40% of the coding region in the fibronectin gene. They code for the middle region of the polypeptide which consists of homologous repeating segments of about 90 amino acids called type III homologies. Nucleotide sequence of the portion of the gene corresponding to the cell attachment domain showed that the Arg-Gly-Asp-Ser cell attachment site is encoded within a 165-base pair exon. This exon, together with a 117-base pair exon codes for a homology unit. Analysis of the exon/intron organization in some of the neighboring homology units indicated a similar 2-exon structure. An exception to this pattern is that a single large exon codes for a type III homology unit that, due to alternative mRNA splicing, exists in some but not all fibronectin polypeptides. The introns separating the coding sequences for the type III homology units are located in conserved positions whereas the introns that interrupt the coding sequence within the units are in a variable position generating variations in the size of the homologous exons. This exon/intron organization suggests that the type III homology region of the fibronectin gene has evolved by a series of gene duplications of a primordial gene consisting of two exons. Specification of one of these homology units to the cell attachment domain has occurred within this exon/intron arrangement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号