首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Structure of the rat prolactin gene   总被引:17,自引:0,他引:17  
The organization and sequence of the rat preprolactin gene has been investigated. Analysis of two different plasmids containing pituitary cDNA inserts has provided the complete 681-nucleotide coding sequence of preprolactin as well as 17 nucleotides preceding the initiation codon and 90 nucleotides following the termination codon. Digestion of rat chromosomal DNA with the restriction endonuclease Eco RI followed by size fractionation and hybridization to a labeled prolactin cDNA probe has demonstrated that prolactin genomic sequences are located on 6.0-, 3.9-, and 2.9-kilobase fragments. The 6.0- and 3.9-kilobase fragments were isolated from a library of cloned rat DNA fragments. The sequence of more than 1800 nucleotides of the cloned DNA has been determined. The sequenced region contains coding regions of 180 and 189 nucleotides which specify the COOH-terminal 123 amino acids of the 227-amino-acid sequence of rat preprolactin. These coding regions are separated by an intervening sequence of 597 nucleotides. At least one other large intervening sequence separates this region from the region coding for the NH2-terminal portion of preprolactin. Hybridization experiments suggested that the intervening sequences of the rat prolactin gene contain DNA sequences which are repeated elsewhere in the rat genome.  相似文献   

3.
4.
Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences greater than 0.9 greater than lambda DNA greater than noncoding, non-RNA greater than phi X174 DNA greater than complementary strands greater than RNA genes congruent to 0.6 greater than transposon-insertion elements greater than T7DNA much greater than eukaryotic sequences congruent to 0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.  相似文献   

5.
Complementary and genomic DNA clones corresponding to the human C-reactive protein (CRP) mRNA and structural gene have been analyzed and compared. Nucleotide sequencing of the coding regions of both cDNA and genomic DNA revealed an additional 19 amino acid peptide not described in the published CRP amino acid sequence. The CRP gene contains a single 278 base pair intron within the codon specifying the third residue of mature CRP. The intron contains a repetitive sequence (GT)15G(GT)3 which is similar to structures capable of adopting the Z-DNA form. A comparison of CRP coding and amino acid sequences with those of serum amyloid P component revealed striking overall homology which was not uniform: a region of limited conservation is bounded by two highly conserved regions.  相似文献   

6.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

7.
DNA序列信息的一种新的测度   总被引:4,自引:3,他引:1  
根据信息理论给出了测度DNA序列信息的一种新的方法,获得DNA序列4个层次的信息量测度:Ib,If(1),If(2)andIf(3),这4种信息测度可分别用来测度DNA的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从M.edulis的线粒体基因组中两个较短的编码蛋白质的DNA序列和使用具有不同倍性的间并密码子组组成的模拟DNA序列中所获得计算结果表明,这些信息测度确实能用来揭示所  相似文献   

8.
The long terminal repeat (LTR) region of mouse mammary tumor virus (MMTV) is known to contain an open reading frame of sufficient length to code for a protein of 36,000 Mr. The coding capacity of the 3' sequences of MMTV genomic RNA has been demonstrated by in vitro translation studies, which have reported the synthesis of four related proteins: p36, p24, p21, and p18. These proteins are overlapping translation products of the same open reading frame, with the smaller ones initiating at internal methionine codons. From the predicted amino acid sequence of the LTR protein, we have selected a region likely to be antigenic, obtained a synthetic peptide of that region, and raised antiserum to the peptide. The antipeptide serum specifically immunoprecipitated all four proteins from in vitro translated genomic 3' MMTV RNA, plus an additional one of 32,000 Mr. Published sequence data of MMRV LTRs show an internal AUG codon at a position which could initiate a protein of 32,000 Mr. The three smaller in vitro translation products (p24, p21, and p18) were consistently synthesized in much greater amounts than the p36 or p32 protein. The relative amount of each in vitro synthesized protein from genomic MMTV RNA could be predicted and was in good agreement with the postulated effect of flanking nucleotides on the efficiency of the respective AUG initiation codon. Polyadenylated RNAs, isolated from various mouse tissues, were selected by hybridization to plasmid DNA containing MMTV LTR sequences immobilized on nitrocellulose. In vitro translation of hybrid-selected mRNAs isolated from BALB/c mouse lactating mammary glands and carcinogen-induced mammary tumors, followed by immunoprecipitation with antipeptide serum, revealed that only one polypeptide was synthesized by the MMTV LTR-specific mRNA, the 36,000 Mr species.  相似文献   

9.
An original tetrahedral representation of the Genetic Code (GC) that better describes its structure, degeneration and evolution trends is defined. The possibility to reduce the dimension of the representation by projecting the GC tetrahedron on an adequately oriented plane is also analyzed, leading to some equivalent complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, nucleic acid strands into real or complex genomic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genomic signals, this approach offers the possibility to use a large variety of signal processing methods for their handling and analysis. It is also shown that some essential features of the nucleotide sequences can be better extracted using this representation. Specifically, the paper reports for the first time the existence of a global helicoidal wrapping of the complex representations of the bases along DNA sequences, a large scale trend of genomic signals. New tools for genomic signal analysis, including the use of phase, aggregated phase, unwrapped phase, sequence path, stem representation of components'relative frequencies, as well as analysis of the transitions are introduced at the nucleotide, codon and amino acid levels, and in a multiresolution approach.  相似文献   

10.
11.
With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.  相似文献   

12.
13.
14.

Background  

Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution.  相似文献   

15.
N Mounier  J C Prudhomme 《Biochimie》1986,68(9):1053-1061
To study the regulation of the gene(s) coding for the actin present in the microfilaments involved in the secretion of silk, we have probed a Bombyx mori genomic library with a Drosophila actin cDNA clone and selected 16 recombinant phages. They correspond to 3 different genomic fragments each containing a distinct actin coding sequence. Southern blots of genomic DNA probed with the cloned genes show that in Bombyx mori, there are at least 5 different actin genomic sequences. Two cloned genes A1 and A2 hybridize to a 1.7 kb long mRNA abundant in the carcass of the larva and thus probably code for muscle type actin. The third cloned gene, A3, hybridizes to two mRNAs of about 1.8 kb present in the silk gland and thus probably encodes a cytoplasmic actin. The coding sequence of this gene has been sequenced: it is almost identical to the Drosophila cytoplasmic actin genes but it has a single intron of 92 nucleotides within the codon 116, a position not observed in any other organism.  相似文献   

16.
Abstract

Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences >0.9> lambda DNA> noncoding, non-RNA>φiX174 DNA> complementary strands> RNA genes ?0.6> transposon-insertion elements> T7DNA? eukaryotic sequences ?0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.  相似文献   

17.
Interpolated markov chains for eukaryotic promoter recognition.   总被引:9,自引:0,他引:9  
MOTIVATION: We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS: A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp.  相似文献   

18.
MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/.  相似文献   

19.
gm: a practical tool for automating DNA sequence analysis   总被引:1,自引:0,他引:1  
The gm (gene modeler) program automates the identification ofcandidate genes in anonymous, genomic DNA sequence data, gmaccepts sequence data, organism-specific consensus matricesand codon asymmetry tables, and a set of parameters as input;it returns a set of models describing the structures of candidategenes in the sequence and a corresponding set of predicted aminoacid sequences as output, gm is implemented in C, and has beentested on Sun, VAX, Sequent, MIPS and Cray computers. It iscapable of analyzing sequences of several kilobases containingmulti-exon genes in >1 min execution time on a Sun 4/60. Received on December 4, 1989; accepted on February 28, 1990  相似文献   

20.
The DNA sequence orgainzation of the protein encoding region of the gene for silk fibroin has been analyzed. The accompanying paper (Manningm R. F., and Gage, L. P. (1980) J. Biol. Chem. 255, 9451-9457) shows that the total length of the gene, and its protein, as well as the pattern of restriction sites in the gene is highly polymorphic among inbred stocks of Bombyx mori, In this paper, those features of fibroin gene structure which are invariant among these alleles are presented. Fibroin is composed primarily of relatively short "crystalline" and "amorphous" peptides of known sequence whose arrangement in the protein is unknown. Knowledge of the codons most commonly used in fibroin mRNA allowed utilization of particular restriction inzymes as a means for determing the nature and organization of crystalline and amorphous coding sequences in the fibroin gene. Three restriction endonucleases were identified that cleve sequences coding for amorphous region peptides. Their cleavage pattern revelaed that the repetitive coding sequence of the gene core (approximately 15 kilobases) is divided into at least 10 large crystalline coding domains interrupted by smaller amorphous coding domains. Many restriction endoncleases do not cleave the fibroin core at all, three of them with four gase recognition sequences. Specific deductions as to codon usage and repetitive sequence homogeneity in the gene follow from these results. One novel finding is the rigorous exclusion of the glycine codon GGA prior to serine codons even though this glycine codon is used frequently prior to alanine codons. The sequence homogeneity and the regularly alternating arrangement of crystalline and amorphous coding sequences of the gene are discussed in terms of the function of fibroin protein and the evolution of highly repetitive DNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号