首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners' shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).  相似文献   

2.
3.
4.
5.
6.
7.
We introduce a novel approach for the detection of possible mutations leading to a reading frame (RF) shift in a gene. Deletions and insertions of DNA coding regions are considerable events for genes because an RF shift results in modifications of the extensive region of amino acid sequence coded by a gene. The suggested method is based on the phenomenon of triplet periodicity (TP) in coding regions of genes and its relative resistance to substitutions in DNA sequence. We attempted to extend 326 933 regions of continuous TP found in genes from the KEGG databank by considering possible insertions and deletions. We revealed totally 824 genes where such extension was possible and statistically significant. Then we generated amino acid sequences according to active (KEGG''s) and hypothetically ancient RFs in order to find confirmation of a shift at a protein level. Consequently, 64 sequences have protein similarities only for ancient RF, 176 only for active RF, 3 for both and 581 have no protein similarity at all. We aimed to have revealed lower bound for the number of genes in which a shift between RF and TP is possible. Further ways to increase the number of revealed RF shifts are discussed.  相似文献   

8.
DNA sequences, potentially coding for histidine-rich proteins, were isolated from a P. falciparum genomic library using an oligonucleotide probe consisting of histidine codon repeats. Sequencing revealed that the different DNA fragments contain long repetitive regions very homologous to the probe. One clone was fully sequenced and contains two open reading frames that overlap in the repetitive region but are located on opposite strands. Analysis suggests that both are coding. One frame could code for a small histidine-rich protein, the other for a protein containing many aspartic acid residues. Southern blotting revealed that these sequences are conserved in all three P. falciparum strains studied.  相似文献   

9.
10.
Recently we described the isolation of c-mos (rat). The gene belongs to the family of oncogenes. Some facts render c-mos unique among the oncogenes : a) it does not contain intervening sequences and b) its expression was never detected in a large number of normal mouse tissues examined. We undertook the sequence analysis of c-mos (rat) in order to compare it to the nucleotide sequences published for c-mos (mouse), c-mos (human), c-src and bovine protein kinase. c-mos (rat) contains an open reading frame of 1017 nucleotides, coding for a polypeptide of 339 amino acids. c-mos (rat)-makes use of the same ATG that defines the N-terminus of the c-mos (human) protein. By comparing all c-mos sequences available we found sequences with high mutational rates to be confined to certain domains. This comparison, together with data on the biological activities of the cloned DNA, allowed us to tentatively define regions involved in (a) function(s) of c-mos other than transformation.  相似文献   

11.
Cloning of foreign DNA fragments for coding sequence analysis in Escherichia coli usually involves sets of three vectors. To simplify this, we constructed an expression vector named pMFV7 containing three ATG codons in different frames downstream of a Shine-Dalgarno sequence, assuming that the ribosome can use any of the three start codons in an alternative manner. Translation beginning at either of the start codons would drive the expression of any coding fragment cloned downstream. To test the feasibility of this proposal, we cloned DNA fragments of the lacZ gene in each of the possible reading frames downstream from pMFV7 start codons. Sequence analysis of the N-terminus regions around the fusion sites indicates that ribosomes indeed initiate translation at each of the three initiation codons. In one case, levels of beta-galactosidase activity depended largely on the N-terminus of the translation products. We conclude that pMFV7 may be useful for expressing coding sequences regardless of their reading frame.  相似文献   

12.
By analyses of short DNA sequences, we have deduced the overall arrangement of genes in the (A + T)-rich coding sequences of herpesvirus saimiri (HVS) relative to the arrangements of homologous genes in the (G + C)-rich coding sequences of the Epstein-Barr virus (EBV) genome and the (A + T)-rich sequences of the varicella-zoster virus (VZV) genome. Fragments of HVS DNA from 13 separate sites within the 111 kilobase pairs of the light DNA coding sequences of the genome were subcloned into M13 vectors, and sequences of up to 350 bases were determined from each of these sites. Amino acid sequences predicted for fragments of open reading frames defined by these sequences were compared with a library of the protein sequences of major open reading frames predicted from the complete DNA sequences of VZV and EBV. Of the 13 short amino acid sequences obtained from HVS, only 3 were recognizably homologous to proteins encoded by VZV, but all 13 HVS sequences were unambiguously homologous to gene products encoded by EBV. The HVS reading frames identified by this method included homologs of the major capsid polypeptides, glycoprotein H, the major nonstructural DNA-binding protein, thymidine kinase, and the homolog of the regulatory gene product of the BMLF1 reading frame of EBV. Locally as well as globally, the order and relative orientation of these genes resembled that of their homologs on the EBV genome. Despite the major differences in their nucleotide compositions and in the nature and arrangements of reiterated DNA sequences, the genomes of the lymphotropic herpesviruses HVS and EBV encode closely related proteins, and they share a common organization of these coding sequences which differs from that of the neurotropic herpesviruses, VZV and herpes simplex virus.  相似文献   

13.
14.
15.
Mutations in fii or tolA of the fii-tolA-tolB gene cluster at 17 min on the Escherichia coli map render cells tolerant to high concentrations of the E colicins and do not allow the DNA of infecting single-stranded filamentous bacteriophages to enter the bacterial cytoplasm. The nucleotide sequence of a 1,854-base-pair DNA fragment carrying the fii region was determined. This sequence predicts three open reading frames sequentially coding for proteins of 134, 230, and 142 amino acids, followed by the potential start of the tolA gene. Oligonucleotide mutagenesis of each open reading frame and maxicell analysis demonstrated that all open reading frames are expressed in vivo. Sequence analysis of mutant fii genes identified the 230-amino acid protein as the fii gene product. Chromosomal insertion mutations were constructed in each of the two remaining open reading frames. The phenotype resulting from an insertion of the chloramphenicol gene into the gene coding for the 142-amino acid protein is identical to that of mutations in fii and tolA. This gene is located between fii and tolA, and we propose the designation of tolQRA for this cluster in which tolQ is the former fii gene and tolR is the new open reading frame. The protein products of this gene cluster play an important role in the transport of large molecules such as the E colicins and filamentous phage DNA into the bacterium.  相似文献   

16.
We cloned and sequenced the gene coding for the polypeptide of a halorhodopsin in Natronobacterium pharaonis (named here pharaonis halorhodopsin). Peptide sequencing of cyanogen bromide fragments, and immunoreactions of the protein and synthetic peptides derived from the COOH-terminal gene sequence, confirmed that the open reading frame is the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences, as well as those for other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences (mutations/nucleotide at codon positions which do not result in amino acid changes) were calculated. These indicate very considerable evolutionary distance between each pair of genes. In spite of this, the three protein sequences show extensive similarities, indicating strong selective pressures. Conserved and conservatively replaced amino acid residues in all three proteins identify general features essential for ion-motive bacterial rhodopsins, responsible for overall structure and chromophore properties. Comparison of the bacteriorhodopsin sequence with those of the two halorhodopsins, on the other hand, identifies features involved in their specific (proton and chloride ion) transport functions.  相似文献   

17.
It has been hypothesized that a large fraction of 24% noncoding DNA in R. prowazekii consists of degraded genes. This hypothesis has been based on the relatively high G+C content of noncoding DNA. However, a comparison with other genomes also having a low overall G+C content shows that this argument would also apply to other bacteria. To test this hypothesis, we study the coding potential in sets of genes, pseudogenes, and intergenic regions. We find that the correlation function and the χ2-measure are clearly indicative of the coding function of genes and pseudogenes. However, both coding potentials make almost no indication of a preexisting reading frame in the remaining 23% of noncoding DNA. We simulate the degradation of genes due to single-nucleotide substitutions and insertions/deletions and quantify the number of mutations required to remove indications of the reading frame. We discuss a reduced selection pressure as another possible origin of this comparatively large fraction of noncoding sequences. Received: 27 December 1999 / Accepted: 5 July 2000  相似文献   

18.
We conducted classification for 472,288 regions of triplet periodicity found in 578,868 genes from release 29 of KEGG databank. A new concept of triplet periodicity class and a measure of similarity between them are introduced. Totally 2520 classes were created that contain 94% of found triplet periodicity. For 92% of triplet periodicity regions contained in classes an identical linkage of triplet periodicity to reading frame is observed. For the rest triplet periodicity cases a shift between reading frame of a gene and reading frame common for majority of genes contained in a class of triplet periodicity was observed. These periodicity regions were encoded into hypothetical amino acid sequences in accordance with reading frame built by triplet periodicity class. By BLAST program it was shown that 2660 hypothetical amino acid sequences have statistically significant similarity with proteins from UniProt databank. We suppose that 8% of triplet periodicity regions that joined classes mutated by means of reading frame shift. Created classes of triplet periodicity can be used for identification of coding regions of genes as well as for searching for mutations arisen from reading frame shift.  相似文献   

19.
The DNA sequences of the entire coding regions of the A and C type variable surface protein genes from Paramecium tetraurelia, stock 51 have been determined. The 8151 nucleotide open reading frame of the A gene contains several tandem repeats of 210 nucleotides within the central portion of the molecule as well as a periodic structure defined by cysteine residues. The 6699 nucleotide open reading frame of the C gene does not contain any identifiable tandem repeats or internal similarity but maintains a periodicity based on the cysteine residue spacing. The deduced amino acid sequences encoded by the two genes are most similar within the 600 amino-terminal and 600 carboxyl-terminal amino acid residues, the central portions show only limited sequence similarity. We conclude that internal repeats are not a conserved feature of variable surface proteins in Paramecium and discuss the possible importance of the regular pattern of cysteine residues.  相似文献   

20.
We have isolated recombinant DNA clones which include cDNA and chromosomal DNA sequences of the major heat shock-inducible gene of Drosophila. With the cDNA fragments used as specific hybridization probes, DNA:DNA reassociation and in situ hybridization analysis demonstrated that the DNA sequences are repeated approximately 7 times in the haploid Drosophila genome, and that gene sequences are present at both the 87A and 87C loci on the cytological map. The cloned cDNA and homologous cloned chromosomal DNA hybridized to mRNA which translated in vitro into the major 70K heat shock-specific protein. Here we summarize a study of the organization of genes coding for the 70K heat shock-specific protein contained in the two recombinant chromosomal DNA plasmids pG3 and pG5. On the basis of R loop hybridization experiments and restriction enzyme analysis, we conclude that a 14 kb fragment, G3, contains three copies of the gene coding for the 70K protein. A second 9.2 kb fragment, G5, contains one copy of the gene coding for the 70K protein. Hybridization of labeled poly(A)-containing RNA to restriction endonuclease-cleaved DNA indicates that the mRNA coding regions in G3 and G5 are each approximately 2100 bp long. The three tandemly repeated genes of G3 are separated by approximately 1400 bp of spacer DNA. The two internal spacer regions in G3 appear to be identical, whereas differences in restriction enzyme sites indicate that the sequences adjacent to the cluster differ from the internal spacer and from each other.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号