首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Previous studies have shown that the identification and analysis of both abundant and rare k-mers or “DNA words of length k” in genomic sequences using suitable statistical background models can reveal biologically significant sequence elements. Other studies have investigated the uni/multimodal distribution of k-mer abundances or “k-mer spectra” in different DNA sequences. However, the existing background models are affected to varying extents by compositional bias. Moreover, the distribution of k-mer abundances in the context of related genomes has not been studied previously. Here, we present a novel statistical background model for calculating k-mer enrichment in DNA sequences based on the average of the frequencies of the two (k-1) mers for each k-mer. Comparison of our null model with the commonly used ones, including Markov models of different orders and the single mismatch model, shows that our method is more robust to compositional AT-rich bias and detects many additional, repeat-poor over-abundant k-mers that are biologically meaningful. Analysis of overrepresented genomic k-mers (4≤k≤16) from four yeast species using this model showed that the fraction of overrepresented DNA words falls linearly as k increases; however, a significant number of overabundant k-mers exists at higher values of k. Finally, comparative analysis of k-mer abundance scores across four yeast species revealed a mixture of unimodal and multimodal spectra for the various genomic sub-regions analyzed.  相似文献   

2.
The study of a few genes has permitted the identification of three elements that constitute a yeast polyadenylation signal: the efficiency element (EE), the positioning element and the actual site for cleavage and polyadenylation. In this paper we perform an analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes. Several oligonucleotide families appear over-represented with a high significance (referred to herein as ‘words’). The family with the highest over-representation includes the oligonucleotides shown experimentally to play a role as EEs. The word with the highest score is TATATA, followed, among others, by a series of single-nucleotide variants (TATGTA, TACATA, TAAATA . . .) and one-letter shifts (ATATAT). A position analysis reveals that those words have a high preference to be in 3′ flanks of yeast genes and there they have a very uneven distribution, with a marked peak around 35 bp after the stop codon. Of the predicted ORFs, 85% show one or more of those sequences. Similar results were obtained using a data set of EST sequences. Other clusters of over-represented words are also detected, namely T- and A-rich signals. Using these results and previously known data we propose a general model for the 3′ trailers of yeast mRNAs.  相似文献   

3.
The exact lengths of linker DNAs connecting adjacent nucleosomes specify the intrinsic three-dimensional structures of eukaryotic chromatin fibers. Some studies suggest that linker DNA lengths preferentially occur at certain quantized values, differing one from another by integral multiples of the DNA helical repeat, approximately 10 bp; however, studies in the literature are inconsistent. Here, we investigate linker DNA length distributions in the yeast Saccharomyces cerevisiae genome, using two novel methods: a Fourier analysis of genomic dinucleotide periodicities adjacent to experimentally mapped nucleosomes and a duration hidden Markov model applied to experimentally defined dinucleosomes. Both methods reveal that linker DNA lengths in yeast are preferentially periodic at the DNA helical repeat ( approximately 10 bp), obeying the forms 10n+5 bp (integer n). This 10 bp periodicity implies an ordered superhelical intrinsic structure for the average chromatin fiber in yeast.  相似文献   

4.
5.
6.
A 2225 bp cDNA, designated RPA1, was isolated from an Oryza sativa cDNA library. Analysis revealed a 1761 bp coding sequence with 15 non-identical repeat units. The ORF encoded the A regulatory subunit of protein phosphatase 2A (PP2A-A) as ascertained by complementation of the yeast tpd3 mutant defective in this gene. The corresponding genomic DNA from a rice genome BAC library revealed that the gene contains eleven introns. The rice genome contains only a single copy of this gene as judged by Southern blot analysis. The PP2A protein is highly conserved in nature; the rice protein shows 88% amino acid identity with its counterparts in Arabidopsis or Nicotiana tabacum.  相似文献   

7.
J Kreike  M Schulze  F Ahne    B F Lang 《The EMBO journal》1987,6(7):2123-2129
We have cloned a 1.6-kb fragment of yeast nuclear DNA, which complements pet- mutant MK3 (mrs1). This mutant was shown to be defective in mitochondrial RNA splicing: the excision of intron 3 from the mitochondrial COB pre-RNA is blocked. The DNA sequence of the nuclear DNA fragment revealed two open reading frames (ORF1 with 1092 bp; ORF2 with 735 bp) on opposite strands, which overlap by 656 bp. As shown by in vitro mutagenesis, ORF1, but not ORF2, is responsible for complementation of the splice defect. Hence, ORF1 represents the nuclear MRS1 gene. Disruption of the gene (both ORFs) in the chromosomal DNA of the respiratory competent yeast strain DBY747 (long form COB gene) leads to a stable pet- phenotype and to the accumulation of the same mitochondrial RNA precursors as in strain MK3. The amino acid sequence of the putative ORF1 product does not exhibit any homology with other known proteins, except for a small region of homology with the gene product of another nuclear yeast gene involved in mitochondrial RNA splicing, CBP2. The function of the MRS1 (ORF1) gene in mitochondrial RNA splicing and the significance of the overlapping ORFs in this gene are discussed.  相似文献   

8.
9.
10.
人SBK1 cDNA的克隆及其相互作用蛋白的筛选   总被引:1,自引:0,他引:1  
首次克隆到人的SBK1(homo sapiens SH3-binding domain kinase 1,SBK1)的cDNA序列,并通过生物信息学的手段,电子克隆到人SBK1的基因组DNA序列.人的SBK1是鼠SBK1的直系同源物,两者基因组DNA结构相似,均含有4个外显子.人的sbk1基因ORF长1 275 bp,编码424个氨基酸,而鼠的ORF长1 254 bp,编码417个氨基酸.两者编码区的核苷酸序列同源性达87.7%,而氨基酸序列同源性达95.7%,在羧基端均有一个PV富集区,推测其能与含有SH3结构域的蛋白质结合.将RT-PCR所获得的长度为1 610 bp的sbk1cDNA序列搜索EST数据库,进行电子延伸,最终获得了约5 kb的人sbk1全长mRNA序列,它与鼠的sbk1全长mRNA大小一致;通过比较基因组学发现UniGene族Hs.97837实际上代表了sbk1基因UniGene族Hs.460471的3′UTR区域,而不是代表了一个新的UniGene族.采用酵母双杂交技术,以SBK1为“诱饵”,获得了与之相互结合的蛋白表皮生长因子受体EGFR和核孤儿受体蛋白NR4A1,它们之间的具体功能关系有待进一步研究.  相似文献   

11.
Summary The pathogenic yeast, Candida albicans, is insensitive to the anti-mitotic drug, benomyl, and to the dihydrofolate reductase inhibitor, methotrexate. Genes responsible for the intrinsic drug resistance were sought by transforming Saccharomyces cerevisiae, a yeast sensitive to both drugs, with genomic C. albicans libraries and screening on benomyl or methotrexate. Restriction analysis of plasmids isolated from benomyl- and methotrexate-resistant colonies indicated that both phenotypes were encoded by the same DNA fragment. Sequence analysis showed that the fragments were nearly identical and contained a long open reading frame of 1694 bp (ORF1) and a small ORF of 446 bp (ORF2) within ORF1 on the opposite strand. By site-directed mutagenesis, it was shown that ORF1 encoded both phenotypes. The protein had no sequence similarity to any known proteins, including -tubulin, dihydrofolate reductase, and the P-glycoprotein of the multi-drug resistance family. The resistance gene was detected in several C. albicans strains and in C. stellatoidea by DNA hybridization and by the polymerase chain reaction.  相似文献   

12.
Summary A DNA fragment conferring resistance to zinc and cadmium ions in the yeast Saccharomyces cerevisiae was isolated from a library of yeast genomic DNA. Its nucleotide sequence revealed the presence of a single open reading frame (ORF; 1326 bp) having the potential to encode a protein of 442 amino acid residues (molecular mass of 48.3 kDa). A frameshift mutation introduced within the ORF abolished resistance to heavy metal ions, indicating the ORF is required for resistance. Therefore, we termed it the ZRC1 (zinc resistance conferring) gene. The deduced amino acid sequence of the gene product predicts a rather hydrophobic protein with six possible membrane-spanning regions. While multiple copies of the ZRC1 gene enable yeast cells to grow in the presence of 40 mM Zn2+, a level at which wild-type cells cannot survive, the disruption of the chromosomal ZRC1 locus, though not a lethal event, makes cells more sensitive to zinc ions than are wild-type cells.  相似文献   

13.
粘虫核型多角体病毒一个新基因的序列和结构研究   总被引:2,自引:0,他引:2  
本文报道TsNPVDNAEcoRV/XhoI的5.5kb片段的克隆及其物理图谱。测定了其中一个819bP片段的全序列。在长346bp的完整ORF的5’端除有TATAbox和CAATbox以外,还发现有典型的早期基因启动子元件ACGT和GC单元以及晚期基因的转录起始保守序列TTAAG。预测的ORF产物为114个氨基酸、分子量为13kD的蛋白质,故命名为p13蛋白。在p13基因的C端和N端分别有一个亮氨酸拉链和两个类亮氨酸拉链结构(我们称为亮氨酸转型结构和LVT重复结构〕.从终止密码起有一个双发卡环结构,p13基因的5’端调控单元及其排列与BmNPV和AcNPV的细胞序死抑制基因(p35)十分相似,但p13基因与已经发现的杆状病毒基因序列和氨基酸序列没有同源性.因此,这是一个新的早晚期基因,其调控元件可能具有重要功能。  相似文献   

14.
15.
16.
17.
18.
The open reading frame (ORF) of the human Tom20 gene (hTom20) was amplified by PCR from a HeLa cDNA library using primers based on the sequence of HUMRSC145 and cloned into a pET15b vector. Amplification of human genomic DNA using these primers yielded a DNA fragment of the same size as that of the ORF of hTom20 cDNA. Sequencing of this fragment revealed that: (1) it has the same number of base pairs as the ORF of hTom20 cDNA (438 bp); and (2) the two sequences differ by 14 single base pair substitutions (97% similarity) causing eight changes in the amino acid sequence and two premature stop codons. Further amplification of human genomic DNA adaptor-ligated libraries using primers based on HUMRSC145 revealed three different sequence-related genomic regions; one corresponding to the fragment referred above, another corresponding to the hTom20 gene, and a third fragment of which the sequence differs from the ORF of hTom20 cDNA by only 22 base pair substitutions and a deletion of 4 bp. We conclude that, in addition to the hTom20 gene, there are two genomic DNA sequences (Ψ1Tom20 and Ψ2Tom20) that are processed pseudogenes of hTom20. Aspects concerning their evolutionary origin are discussed. Received: 12 September 1997 / Accepted: 29 November 1997  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号