首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Sajid Marhon 《Bio Systems》2010,101(3):185-676
In a previous paper (Yin and Yau, 2005), a novel method was proposed to measure the power spectrum of a DNA sequence at frequency N/3 in order to distinguish protein-coding and non-coding regions in DNA sequences. This was accomplished by computing the distribution of the four nucleotides in the three reading frames (codon positions) and identifying variance as an indicator of 3-base periodicity. That work included an empirical justification for the claim that there exists a linear, 3:2 correlation between the variance and the power spectrum. In this note, we provide a theoretical justification for that observation in the form of a mathematical proof of this correlation. This work thus provides a more rigorous justification for the use of the variance instead of the more computationally expensive power spectrum, allowing users of this technique to apply it with absolute confidence that no compromise in accuracy is incurred.  相似文献   

2.
Sánchez J 《Bioinformation》2011,6(9):327-329
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where "|" indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed.  相似文献   

3.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

4.
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.  相似文献   

5.
Accurate identification of protein-coding regions (exons) in DNA sequences has been a challenging task in bioinformatics. Particularly the coding regions have a 3-base periodicity, which forms the basis of all exon identification methods. Many signal processing tools and techniques have been applied successfully for the identification task but still improvement in this direction is needed. In this paper, we have introduced a new promising model-independent time-frequency filtering technique based on S-transform for accurate identification of the coding regions. The S-transform is a powerful linear time-frequency representation useful for filtering in time-frequency domain. The potential of the proposed technique has been assessed through simulation study and the results obtained have been compared with the existing methods using standard datasets. The comparative study demonstrates that the proposed method outperforms its counterparts in identifying the coding regions.  相似文献   

6.
Summary The evolutionary history of the intracellular calcium-binding protein superfamily is well documented. The members of this gene family are all believed to be derived from a common ancestor, which, itself, was the product of two successive gene duplications. In this study, we have compared and analyzed the structures of the recently described genes coding for these proteins. We propose a series of evolutionary events, which include exon shuffling and intron insertion, that could account for the evolutionary origin of all the members of this super-family. According to this hypothesis, the ancestral gene, a product of two successive duplications, consisted of at least four exons. Each exon coding for a peptide (a calcium-binding domain) was separated by an intron that had mediated the duplication. Each distinct lineage evolved from this ancestor by genomic rearrangement, with insertion of introns being a prominent feature.  相似文献   

7.
8.
9.
10.
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a 'coding statistic' is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C. elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.  相似文献   

11.
12.
13.
In previous work (E. S. Tessman and P. K. Peterson, J. Bacteriol. 163:677-687 and 688-695, 1985), we isolated many novel protease-constitutive (Prtc) recA mutants, i.e., mutants in which the RecA protein was always in the protease state without the usual need for DNA damage to activate it. Most Prtc mutants were recombinase positive and were designated Prtc Rec+; only a few Prtc mutants were recombinase negative, and those were designated Prtc Rec-. We report changes in DNA sequence of the recA gene for several of these mutants. The mutational changes clustered at three regions on the linear RecA polypeptide. Region 1 includes amino acid residues 25 through 39, region 2 includes amino acid residues 157 through 184, and region 3 includes amino acid residues 298 through 301. The in vivo response of these Prtc mutants to different effectors suggests that the RecA effector-binding sites have been altered. In particular we propose that the mutations may define single-stranded DNA- and nucleoside triphosphate-binding domains of RecA, that polypeptide regions 1 and 3 comprise part of the single-stranded DNA-binding domain, and that polypeptide regions 2 and 3 comprise part of the nucleoside triphosphate-binding domain. The overlapping of single-stranded DNA- and nucleoside triphosphate-binding domains in region 3 can explain previously known complex allosteric effects. Each of four Prtc Rec- mutants sequenced was found to contain a single amino acid change, showing that the change of just one amino acid can affect both the protease and recombinase activities and indicating that the functional domains for these two activities of RecA overlap. A recA promoter-down mutation was isolated by its ability to suppress the RecA protease activity of one of our strong Prtc mutants.  相似文献   

14.
Y Tsujimoto  Y Suzuki 《Cell》1979,18(2):591-600
Sequence analysis of the cloned genomic fibroin gene and cDNA containing the sequence complementary to fibroin mRNA has been carried out for the regions covering the 5′ flanking, mRNA coding, entire intervening sequence and its borders, and fibroin coding sequences. The sequences determined on the gene extend from nucleotide ?552 to +1497 (assigning +1 to the cap locus); sequence analysis of the cDNA has confirmed our previous mapping of the cap locus (Tsujimoto and Suzuki, 1979). Comparison of the nucleotide sequence of the genomic gene with that of cDNA has confirmed the existence of an intervening sequence 970 bases long. The sequence comparison also pinpointed the 5′ coding-intervening junction at +64?66 and the 3′ intervening-coding junction at +1034?1036. Both the 5′ and 3′ junctions of the fibroin gene (insect) share homologous segments of about 10 nucleotides each with the published sequences of β-globin (mammal), immunoglobulin (mammal) and ovalbumin (avian) genes. A long inverted repeat sequence (17 of 23 base match) has been found next to the junctions within the intervening sequence of the fibroin gene. The repetitious sequence that codes for the Gly-Ala peptide characteristic of fibroin protein begins at position +1448. The characteristics of the N terminal portion of fibroin protein (or its precursor) are discussed, as are the features of the 5′ flanking sequence of the gene and the mRNA sequence (with special attention to the putative promoter sequence for the gene), the possible secondary structure and a sequence complementary to the 3′ end of 18S ribosomal RNA at the 5′ proximal region of fibroin mRNA.  相似文献   

15.
Totally, 472 288 regions of triplet periodicity were found in 578 868 genes from KEGG databank version 29 and classified. A new concept of triplet periodicity class and a measure of similarity between periodicity classes were introduced. Overall, 2520 classes were created and contained 94% of the triplet periodicity cases found. A similar correlation between the triplet periodicity and reading frame was observed for 92% of triplet periodicity regions contained in different classes. The remaining triplet periodicity regions displayed a shift of the reading frame relative to that common for the majority of genes belonging to the same triplet periodicity class. The hypothetical amino acid sequences were deduced from the periodicity regions according to the reading frame characteristic of the given triplet periodicity class. BLAST analysis demonstrated that 2660 hypothetical amino acid sequences display a statistically significant similarity to proteins from the Uni-Prot databank. It was supposed that 8% of the triplet periodicity regions contained in the classes have frameshift mutations. The triplet periodicity classes can be used to identify the coding regions in genes and to searching for frameshift mutations.  相似文献   

16.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozygosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

17.
Most proterminal regions of human chromosomes are GC-rich and gene-rich. Chromosome 3p is an exception. Its proterminal region is GC-poor, and likely to lose heterozy-gosity, thus causing a number of fatal diseases. Except one gap left in the telomeric position, the proterminal region of human chromosome 3p has been completely sequenced. The detailed sequence analysis showed: (i) the GC content of this region was 38.5%, being the lowest among all the human proterminal regions; (ii) this region contained 20 known genes and 22 predicted genes, with an average gene size of 97.5 kb. The previously mapped gene Cntn3 was not found in this region, but instead located in the 74 Mb position of human chromosome 3p; (iii) the interspersed repeats of this region were more active than the average level of the whole human genome, especially (TA)n, the content of which was twice the genome average; (iv) this region had a conserved synteny extending from 104.1 Mb to 112.4 Mb on the mouse chromosome 6, which was 8% larger in size, not in accordance with the whole genome comparison, probably because the 3pter-p26 region was more likely to lose neocleitides and its mouse synteny had more active interspersed repeats.  相似文献   

18.
A cotton Ltp3 gene and its 5′ and 3′ flanking regions have been cloned with a PCR-based genomic DNA walking method. The amplified 2.6 kb DNA fragment contains sequences corresponding to GH3 cDNA which has been shown to encode a lipid transfer protein (LTP3). The gene has an intron of 80 bp which is located in the region corresponding to the C-terminus of LTP3. The Ltp3 promoter was systematically analyzed in transgenic tobacco plants by employing the Escherichia coli β-glucuronidase gene (GUS) as a reporter. The results of histochemical and fluorogenic GUS assays indicate that the 5′ flanking region of the Ltp3 gene contains cis-elements conferring the trichome specific activity of Ltp3 promoter.  相似文献   

19.
The HLA region harbors some of the most polymorphic loci in the human genome. Among them is the class II locus HLA-DRB1, with more than 400 known alleles. The age of the polymorphism and the rate at which new alleles are generated at HLA loci has caused much controversy over the years. Previous studies have mostly been restricted to the 270 base pairs that constitute the second exon and represent the most variable part of the gene. Here, we investigate the evolutionary history of the HLA-DRB1 locus on the basis of an analysis of 15 genomic full-length alleles (10-15 kb). In addition, the variation in 49 complete coding sequences and 322 exon 2 sequences were analyzed. When excluding exon 2 from the analysis, the diversity at the synonymous sites was found to be similar to the intron diversity. The overall diversity in noncoding region was also similar to the genome average. The DRB1*03 lineage has been found in human, chimpanzee, bonobo, gorilla, and orangutan. An ancestral "proto HLA-DRB1*03 lineage" appeared to have diverged in the last 5 million years into the human-specific lineages *08, *11, *13, and *14. With exception to exon 2, both the coding- and the noncoding diversity suggests a recent origin (<1 million years ago) for most of the alleles at the HLA-DRB1 locus. Sites encoding for amino acids involved in antigen binding [antigen recognizing sites (ARS)] appear to have a more ancient origin. Taken together, the recent origin of most alleles, the high diversity between allelic lineages, and the ancient origin of sequence motifs in exon 2, is consistent with a relatively rapid generation of novel alleles by gene conversion like events.  相似文献   

20.
Weining Chen  Seow Fong Yap  Louis Lim   《Gene》1996,180(1-2):217-219
When screening a Caenorhabditis elegans genomic library using the human Racl cDNA as probe, a hybridizing fragment of 2.7 kb was isolated which contained four exons with high sequence similarity to CeRacl, coding for the nematode homologue of the Ras-related small GTP-binding protein Racl. The putative translational product of 195 amino acids (aa) from the exons displayed 88% identity to the sequence of CeRacl. Interestingly, three alterations were found in the N-terminal ‘effector domain’ (residues 22–45) which hitherto was identical among all known Rac p21s, suggesting that CeRac2 might have different targets/functions for nematode development. Additionally, an insertion of 4 as was found in the hypervariable region at the C terminus of CeRac2.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号