期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multivariate entropy distance method for prokaryotic gene identification

Ouyang Z Zhu H Wang J She ZS 《Journal of bioinformatics and computational biology》2004,2(2):353-373

相似文献

2.

Structure of the rat prolactin gene 总被引：17，自引：0，他引：17

E J Gubbins R A Maurer M Lagrimini C R Erwin J E Donelson 《The Journal of biological chemistry》1980,255(18):8655-8662

The organization and sequence of the rat preprolactin gene has been investigated. Analysis of two different plasmids containing pituitary cDNA inserts has provided the complete 681-nucleotide coding sequence of preprolactin as well as 17 nucleotides preceding the initiation codon and 90 nucleotides following the termination codon. Digestion of rat chromosomal DNA with the restriction endonuclease Eco RI followed by size fractionation and hybridization to a labeled prolactin cDNA probe has demonstrated that prolactin genomic sequences are located on 6.0-, 3.9-, and 2.9-kilobase fragments. The 6.0- and 3.9-kilobase fragments were isolated from a library of cloned rat DNA fragments. The sequence of more than 1800 nucleotides of the cloned DNA has been determined. The sequenced region contains coding regions of 180 and 189 nucleotides which specify the COOH-terminal 123 amino acids of the 227-amino-acid sequence of rat preprolactin. These coding regions are separated by an intervening sequence of 597 nucleotides. At least one other large intervening sequence separates this region from the region coding for the NH2-terminal portion of preprolactin. Hybridization experiments suggested that the intervening sequences of the rat prolactin gene contain DNA sequences which are repeated elsewhere in the rat genome. 相似文献

3.

Truncated phenylalanine ammonia-lyase expression in tomato (Lycopersicon esculentum). 总被引：6，自引：0，他引：6

S W Lee J Robb R N Nazar 《The Journal of biological chemistry》1992,267(17):11824-11830

相似文献

4.

Degrees of divergence in the E. coli genome from correlations between dinucleotide, trinucleotide and codon frequencies

P W Hinds R D Blake 《Journal of biomolecular structure & dynamics》1984,2(1):101-118

Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences greater than 0.9 greater than lambda DNA greater than noncoding, non-RNA greater than phi X174 DNA greater than complementary strands greater than RNA genes congruent to 0.6 greater than transposon-insertion elements greater than T7DNA much greater than eukaryotic sequences congruent to 0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome. 相似文献

5.

Characterization of genomic and complementary DNA sequence of human C-reactive protein, and comparison with the complementary DNA sequence of serum amyloid P component 总被引：17，自引：0，他引：17

P Woo J R Korenberg A S Whitehead 《The Journal of biological chemistry》1985,260(24):13384-13388

Complementary and genomic DNA clones corresponding to the human C-reactive protein (CRP) mRNA and structural gene have been analyzed and compared. Nucleotide sequencing of the coding regions of both cDNA and genomic DNA revealed an additional 19 amino acid peptide not described in the published CRP amino acid sequence. The CRP gene contains a single 278 base pair intron within the codon specifying the third residue of mature CRP. The intron contains a repetitive sequence (GT)15G(GT)3 which is similar to structures capable of adopting the Z-DNA form. A comparison of CRP coding and amino acid sequences with those of serum amyloid P component revealed striking overall homology which was not uniform: a region of limited conservation is bounded by two highly conserved regions. 相似文献

6.

Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence 总被引：2，自引：0，他引：2

Yin C Yau SS 《Journal of theoretical biology》2007,247(4):687-694

With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction. 相似文献

7.

DNA序列信息的一种新的测度 总被引：4，自引：3，他引：1

谭远德《生物数学学报》2000,15(1):45-54

根据信息理论给出了测度ＤＮＡ序列信息的一种新的方法,获得ＤＮＡ序列４个层次的信息量测度：Ｉｂ,Ｉｆ（１）,Ｉｆ（２）ａｎｄＩｆ（３）,这４种信息测度可分别用来测度ＤＮＡ的碱基序列、密码子序列、编码蛋白质序列和功能蛋白质序列的信息量。从Ｍ．ｅｄｕｌｉｓ的线粒体基因组中两个较短的编码蛋白质的ＤＮＡ序列和使用具有不同倍性的间并密码子组组成的模拟ＤＮＡ序列中所获得计算结果表明,这些信息测度确实能用来揭示所相似文献

8.

Proteins encoded by the long terminal repeat region of mouse mammary tumor virus: identification by hybrid-selected translation. 总被引：7，自引：6，他引：1

下载免费PDF全文

J Racevskis O Prakash 《Journal of virology》1984,51(3):604-610

The long terminal repeat (LTR) region of mouse mammary tumor virus (MMTV) is known to contain an open reading frame of sufficient length to code for a protein of 36,000 Mr. The coding capacity of the 3' sequences of MMTV genomic RNA has been demonstrated by in vitro translation studies, which have reported the synthesis of four related proteins: p36, p24, p21, and p18. These proteins are overlapping translation products of the same open reading frame, with the smaller ones initiating at internal methionine codons. From the predicted amino acid sequence of the LTR protein, we have selected a region likely to be antigenic, obtained a synthetic peptide of that region, and raised antiserum to the peptide. The antipeptide serum specifically immunoprecipitated all four proteins from in vitro translated genomic 3' MMTV RNA, plus an additional one of 32,000 Mr. Published sequence data of MMRV LTRs show an internal AUG codon at a position which could initiate a protein of 32,000 Mr. The three smaller in vitro translation products (p24, p21, and p18) were consistently synthesized in much greater amounts than the p36 or p32 protein. The relative amount of each in vitro synthesized protein from genomic MMTV RNA could be predicted and was in good agreement with the postulated effect of flanking nucleotides on the efficiency of the respective AUG initiation codon. Polyadenylated RNAs, isolated from various mouse tissues, were selected by hybridization to plasmid DNA containing MMTV LTR sequences immobilized on nitrocellulose. In vitro translation of hybrid-selected mRNAs isolated from BALB/c mouse lactating mammary glands and carcinogen-induced mammary tumors, followed by immunoprecipitation with antipeptide serum, revealed that only one polypeptide was synthesized by the MMTV LTR-specific mRNA, the 36,000 Mr species. 相似文献

9.

Conversion of nucleotides sequences into genomic signals

Cristea PD 《Journal of cellular and molecular medicine》2002,6(2):279-303

An original tetrahedral representation of the Genetic Code (GC) that better describes its structure, degeneration and evolution trends is defined. The possibility to reduce the dimension of the representation by projecting the GC tetrahedron on an adequately oriented plane is also analyzed, leading to some equivalent complex representations of the GC. On these bases, optimal symbolic-to-digital mappings of the linear, nucleic acid strands into real or complex genomic signals are derived at nucleotide, codon and amino acid levels. By converting the sequences of nucleotides and polypeptides into digital genomic signals, this approach offers the possibility to use a large variety of signal processing methods for their handling and analysis. It is also shown that some essential features of the nucleotide sequences can be better extracted using this representation. Specifically, the paper reports for the first time the existence of a global helicoidal wrapping of the complex representations of the bases along DNA sequences, a large scale trend of genomic signals. New tools for genomic signal analysis, including the use of phase, aggregated phase, unwrapped phase, sequence path, stem representation of components'relative frequencies, as well as analysis of the transitions are introduced at the nucleotide, codon and amino acid levels, and in a multiresolution approach. 相似文献

10.

Characterisation of saporin genes: in vitro expression and ribosome inactivation

Anthony P. Fordham-Skelton Philip N. Taylor Martin R. Hartley Ronald R. D. Croy 《Molecular & general genetics : MGG》1991,229(3):460-466

相似文献

11.

Recurrence time statistics: versatile tools for genomic DNA sequence analysis

Cao Y Tung WW Gao JB Qi Y 《Journal of bioinformatics and computational biology》2005,3(3):677-696

With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC. 相似文献

12.

Theatre: A software tool for detailed comparative analysis and visualization of genomic sequence 总被引：1，自引：0，他引：1

Edwards YJ Carver TJ Vavouri T Frith M Bishop MJ Elgar G 《Nucleic acids research》2003,31(13):3510-3517

相似文献

13.

Computational identification of transcriptional regulatory elements in DNA sequence

下载免费PDF全文

GuhaThakurta D 《Nucleic acids research》2006,34(12):3585-3598

相似文献

14.

Empirical codon substitution matrix

Adrian?Schneider Gina?M?Cannarozzi Email author Gaston?H?Gonnet 《BMC bioinformatics》2005,6(1):134

Background

Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution. 相似文献

15.

Isolation of actin genes in Bombyx mori: the coding sequence of a cytoplasmic actin gene expressed in the silk gland is interrupted by a single intron in an unusual position 总被引：7，自引：0，他引：7

N Mounier J C Prudhomme 《Biochimie》1986,68(9):1053-1061

To study the regulation of the gene(s) coding for the actin present in the microfilaments involved in the secretion of silk, we have probed a Bombyx mori genomic library with a Drosophila actin cDNA clone and selected 16 recombinant phages. They correspond to 3 different genomic fragments each containing a distinct actin coding sequence. Southern blots of genomic DNA probed with the cloned genes show that in Bombyx mori, there are at least 5 different actin genomic sequences. Two cloned genes A1 and A2 hybridize to a 1.7 kb long mRNA abundant in the carcass of the larva and thus probably code for muscle type actin. The third cloned gene, A3, hybridizes to two mRNAs of about 1.8 kb present in the silk gland and thus probably encodes a cytoplasmic actin. The coding sequence of this gene has been sequenced: it is almost identical to the Drosophila cytoplasmic actin genes but it has a single intron of 92 nucleotides within the codon 116, a position not observed in any other organism. 相似文献

16.

Degrees of Divergence in the E. coli Genome From Correlations Between Dinucleotide,Trinucleotide and Codon Frequencies

Philip W. Hinds R. D. Blake 《Journal of biomolecular structure & dynamics》2013,31(1):101-118

Abstract

Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences >0.9> lambda DNA> noncoding, non-RNA>φiX174 DNA> complementary strands> RNA genes ?0.6> transposon-insertion elements> T₇DNA? eukaryotic sequences ?0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome. 相似文献

17.

Interpolated markov chains for eukaryotic promoter recognition. 总被引：9，自引：0，他引：9

U Ohler S Harbeck H Niemann E N?th M G Reese 《Bioinformatics (Oxford, England)》1999,15(5):362-369

MOTIVATION: We describe a new content-based approach for the detection of promoter regions of eukaryotic protein encoding genes. Our system is based on three interpolated Markov chains (IMCs) of different order which are trained on coding, non-coding and promoter sequences. It was recently shown that the interpolation of Markov chains leads to stable parameters and improves on the results in microbial gene finding (Salzberg et al., Nucleic Acids Res., 26, 544-548, 1998). Here, we present new methods for an automated estimation of optimal interpolation parameters and show how the IMCs can be applied to detect promoters in contiguous DNA sequences. Our interpolation approach can also be employed to obtain a reliable scoring function for human coding DNA regions, and the trained models can easily be incorporated in the general framework for gene recognition systems. RESULTS: A 5-fold cross-validation evaluation of our IMC approach on a representative sequence set yielded a mean correlation coefficient of 0.84 (promoter versus coding sequences) and 0.53 (promoter versus non-coding sequences). Applied to the task of eukaryotic promoter region identification in genomic DNA sequences, our classifier identifies 50% of the promoter regions in the sequences used in the most recent review and comparison by Fickett and Hatzigeorgiou ( Genome Res., 7, 861-878, 1997), while having a false-positive rate of 1/849 bp. 相似文献

18.

In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists

Saeys Y Rouzé P Van de Peer Y 《Bioinformatics (Oxford, England)》2007,23(4):414-420

MOTIVATION: Prediction of the coding potential for stretches of DNA is crucial in gene calling and genome annotation, where it is used to identify potential exons and to position their boundaries in conjunction with functional sites, such as splice sites and translation initiation sites. The ability to discriminate between coding and non-coding sequences relates to the structure of coding sequences, which are organized in codons, and by their biased usage. For statistical reasons, the longer the sequences, the easier it is to detect this codon bias. However, in many eukaryotic genomes, where genes harbour many introns, both introns and exons might be small and hard to distinguish based on coding potential. RESULTS: Here, we present novel approaches that specifically aim at a better detection of coding potential in short sequences. The methods use complementary sequence features, combined with identification of which features are relevant in discriminating between coding and non-coding sequences. These newly developed methods are evaluated on different species, representative of four major eukaryotic kingdoms, and extensively compared to state-of-the-art Markov models, which are often used for predicting coding potential. The main conclusions drawn from our analyses are that (1) combining complementary sequence features clearly outperforms current Markov models for coding potential prediction in short sequence fragments, (2) coding potential prediction benefits from length-specific models, and these models are not necessarily the same for different sequence lengths and (3) comparing the results across several species indicates that, although our combined method consistently performs extremely well, there are important differences across genomes. SUPPLEMENTARY DATA: http://bioinformatics.psb.ugent.be/. 相似文献

19.

gm: a practical tool for automating DNA sequence analysis 总被引：1，自引：0，他引：1

Fields C.A.; Soderlund C.A. 《Bioinformatics (Oxford, England)》1990,6(3):263-270

The gm (gene modeler) program automates the identification ofcandidate genes in anonymous, genomic DNA sequence data, gmaccepts sequence data, organism-specific consensus matricesand codon asymmetry tables, and a set of parameters as input;it returns a set of models describing the structures of candidategenes in the sequence and a corresponding set of predicted aminoacid sequences as output, gm is implemented in C, and has beentested on Sun, VAX, Sequent, MIPS and Cray computers. It iscapable of analyzing sequences of several kilobases containingmulti-exon genes in >1 min execution time on a Sun 4/60. Received on December 4, 1989; accepted on February 28, 1990 相似文献

20.

Internal structure of the silk fibroin gene of Bombyx mori. I The fibroin gene consists of a homogeneous alternating array of repetitious crystalline and amorphous coding sequences 总被引：5，自引：0，他引：5

L P Gage R F Manning 《The Journal of biological chemistry》1980,255(19):9444-9450

The DNA sequence orgainzation of the protein encoding region of the gene for silk fibroin has been analyzed. The accompanying paper (Manningm R. F., and Gage, L. P. (1980) J. Biol. Chem. 255, 9451-9457) shows that the total length of the gene, and its protein, as well as the pattern of restriction sites in the gene is highly polymorphic among inbred stocks of Bombyx mori, In this paper, those features of fibroin gene structure which are invariant among these alleles are presented. Fibroin is composed primarily of relatively short "crystalline" and "amorphous" peptides of known sequence whose arrangement in the protein is unknown. Knowledge of the codons most commonly used in fibroin mRNA allowed utilization of particular restriction inzymes as a means for determing the nature and organization of crystalline and amorphous coding sequences in the fibroin gene. Three restriction endonucleases were identified that cleve sequences coding for amorphous region peptides. Their cleavage pattern revelaed that the repetitive coding sequence of the gene core (approximately 15 kilobases) is divided into at least 10 large crystalline coding domains interrupted by smaller amorphous coding domains. Many restriction endoncleases do not cleave the fibroin core at all, three of them with four gase recognition sequences. Specific deductions as to codon usage and repetitive sequence homogeneity in the gene follow from these results. One novel finding is the rigorous exclusion of the glycine codon GGA prior to serine codons even though this glycine codon is used frequently prior to alanine codons. The sequence homogeneity and the regularly alternating arrangement of crystalline and amorphous coding sequences of the gene are discussed in terms of the function of fibroin protein and the evolution of highly repetitive DNA. 相似文献