首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We have written a computer program, BIGPROBE, which facilitates the design of long nucleic acid probes from the partial or complete amino acid sequence of a protein. BIGPROBE relies upon information on codon usage, intercodon dinucleotide frequency, and potential probe self-complementarity. We have examined the accuracy with which the program predicts coding sequences using sample human and rat genes and probe lengths of 30-60 nucleotides. Rat probe sequences selected by BIGPROBE using either codon usage or dinucleotide frequency data alone averaged 86-92% homology with the known exons of the corresponding gene sequences. Predictive accuracy with rat gene probes could be improved to 89-94%, depending upon probe length, by applying codon usage and dinucleotide frequency data in combination. Similar accuracy was achieved for human genes.  相似文献   

2.
F Daldal 《Gene》1984,28(3):337-342
The nucleotide sequence of a 1.3-kb DNA fragment containing the entire pfkB gene which codes for Pfk-2 of Escherichia coli, a minor phosphofructokinase (Pfk) enzyme, is reported. The Pfk-2 protein subunit is encoded by 924 bp, has 308 amino acids and an Mr of 33 000. Like other weakly expressed E. coli genes the codon usage in the pfkB gene is random; there is no strong bias for the usage of major tRNA isoaccepting species, and the codon preference rules of Grosjean and Fiers [Gene, 18 (1982) 199-209] are followed. This is the first report of the complete gene sequence of a phosphofructokinase.  相似文献   

3.
4.
The codon adaptation index (CAI) values of all protein-coding sequences of the full-length cDNA libraries of Mus musculus were computed based on the RIKEN mouse full-length cDNA library. We have also computed the extent of consensus in flanking sequences of the initiator ATG codon based on the 'relative entropy' values of respective nucleotide positions (from -20 to +12 bp relative to the initiator ATG codon) for each group of genes classified by CAI values. With regard to the two nucleotides positions (-3 and +4) known to be highly conserved in Kozak's consensus sequence, a clear correlation between CAI values and relative entropy values was observed at position -3 but this was not significant at position +4, although a significant correlation was found at position -1 of the consensus sequence. Further, although no correlation was observed at any additional positions, relative entropy values were very high at positions -4, -6, and -8 in genes with high CAI values. These findings suggest that the extent of conservation in the flanking sequence of the initiator ATG codon including Kozak's consensus sequence was an important factor in modulation of the translation efficiency as well as synonymous codon usage bias particularly in highly expressed genes.  相似文献   

5.
6.
Complete chromosome/genome sequences available from humans, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae were analyzed for the occurrence of mono-, di-, tri-, and tetranucleotide repeats. In all of the genomes studied, dinucleotide repeat stretches tended to be longer than other repeats. Additionally, tetranucleotide repeats in humans and trinucleotide repeats in Drosophila also seemed to be longer. Although the trends for different repeats are similar between different chromosomes within a genome, the density of repeats may vary between different chromosomes of the same species. The abundance or rarity of various di- and trinucleotide repeats in different genomes cannot be explained by nucleotide composition of a sequence or potential of repeated motifs to form alternative DNA structures. This suggests that in addition to nucleotide composition of repeat motifs, characteristic DNA replication/repair/recombination machinery might play an important role in the genesis of repeats. Moreover, analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeats corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids. The locations and sequences of all of the repeat loci detected in genome sequences and coding DNA sequences are available at http://www.ncl-india.org/ssr and could be useful for further studies.  相似文献   

7.
Dinucleotide frequencies are useful for characterizing consensus elements as a minimum unit of nucleotide sequence because the neighborhood relations of nucleotide sequences are reflected in dinucleotides. Using a consensus score based on dinucleotide frequencies and intra-species codon usage heterogeneity, denoted by the Z1 parameter, we report the relationship between nucleotide conservation at the translation initiation sites of genes in the Escherichia coli K-12 genome (W3110) and codon usage in its downstream genes. Significant positive correlations were obtained in three regions centered at -13, -4, and +7, which correspond to the Shine-Dalgarno element, the A + T element immediately upstream of the translation initiation site, and the downstream box, respectively.  相似文献   

8.
All established methods for detecting positive selection at the molecular level rely on comparisons between nucleotide sequences. An exceptional method that purports to detect selection on the basis of a single genomic sequence has recently been proposed. This method uses a measure called "codon volatility," defined for each codon as the ratio between the number of nonsynonymous codons that differ from the codon under study at a single nucleotide position and the number of sense codons that differ from the codon under study at a single nucleotide position. Here, we examine various properties of codon volatility and its derivatives and use simulation of evolutionary processes to determine whether they can be used to detect selective pressures. Codons for only four amino acids (glycine, leucine, arginine, and serine) show any variation in codon volatility. Thus, codon volatility is mainly a proxy for amino acid usage, rather than for codon usage, with 65% of all synonymous changes and 27% of all nonsynonymous changes being undetectable by this measure. Genes identified by the volatility method as being subject to positive selection tend to have idiosyncratic amino acid compositions (e.g., they are glycine rich or arginine poor). An additional property of codon volatility is the near zero variance of its mean expectation, which translates into overestimated statistical significance estimates, especially in the absence of corrections for multiple comparisons. A comparison with measures of selection inferred through comparative methodology reveals no relationship between the results of the two methods. Finally, we show that codon volatility can increase in the absence of positive Darwinian selection; that is, increased codon volatility is not indicative of positive selection.  相似文献   

9.
CUTG (codon usage tabulated from GenBank) is a comprehensive database for codon usage. The codon usage for each full-length protein gene has been calculated using the nucleotide sequence obtained from GenBank sequence database. The sum of the codon use of each organism has been also calculated. The data files can be obtained from anonymous ftp sites of DDBJ, DISC and EBI. The list of codonusage of genes in organisms was made searchableby name of organism through a web site http://www.dna.affrc.go.jp/ approximately nakamura/CUTG.html The compilation is synchronized with major release of GenBank.  相似文献   

10.
Along the gene, nucleotides in various codon positions tend to exert a slight but observable influence on the nucleotide choice at neighboring positions. Such context biases are different in different organisms and can be used as genomic signatures. In this paper, we will focus specifically on the dinucleotide composed of a third codon position nucleotide and its succeeding first position nucleotide. Using the 16 possible dinucleotide combinations, we calculate how well individual genes conform to the observed mean dinucleotide frequencies of an entire genome, forming a distance measure for each gene. It is found that genes from different genomes can be separated with a high degree of accuracy, according to these distance values. In particular, we address the problem of recent horizontal gene transfer, and how imported genes may be evaluated by their poor assimilation to the host's context biases. By concentrating on the third- and succeeding first position nucleotides, we eliminate most spurious contributions from codon usage and amino-acid requirements, focusing mainly on mutational effects. Since imported genes are expected to converge only gradually to genomic signatures, it is possible to question whether a gene present in only one of two closely related organisms has been imported into one organism or deleted in the other. Striking correlations between the proposed distance measure and poor homology are observed when Escherichia coli genes are compared to Salmonella typhi, indicating that sets of outlier genes in E. coli may contain a high number of genes that have been imported into E. coli, and not deleted in S. typhi. Received: 16 January 2001 / Accepted: 30 August 2001  相似文献   

11.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

12.
The complete nucleotide sequence (21,359 bp) of the mitochondrial DNA of the rhacophorid frog Rhacophorus schlegelii was determined. The gene content, nucleotide composition, and codon usage of this genome corresponded to those typical of vertebrates. However, the Rh. schlegelii genome was unusually large due to the inclusion of two control regions and the accumulation of lengthy repetitive sequences in these regions. The two control regions had 97% sequence similarity over 1,510 bp, suggesting the occurrence of concerted sequence evolution. Comparison of the gene organizations among anuran species revealed that the mitochondrial gene arrangement of Rh. schlegelii diverged from that of typical vertebrates but was similar to that of Buergeria buergeri. The positions of the tRNA-Leu(CUN) and tRNA-Thr genes were exchanged between Rh. schlegelii and B. buergeri. Based on parsimonious consideration and the basal phylogenetic position of B. buergeri, these genes seemed to have been rearranged in an ancestral lineage leading to Rh. schlegelii.  相似文献   

13.
Complimentary DNA sequence data of Φ × 174, fd, f1, G4, Ml3, MS2, λ and T7 phages ofEscherichia coli are analysed at mono-, di-, tri- and tetranucleotide levels. Our analysis shows that, (i) mononucleotides have certain preferences to occur at specific positions X1, X2, X3 of codon, (ii) These nucleotides interact nonlinearly to form dinucleotide and this dinucleotide also interacts nonlinearely with a third nucleotide to form codon, (iii) However, nonlinear interactions are negligible at tetranucleotide level suggesting that, coding regions of complimentary DNA are Markov chains of order two. Trinucleotide potential values in three frames have suggested that, at least thirteen different trinucleotides can be used as a marker to locate coding regions in DNA of prokaryotes. (iv) Parallel paired codons are expressed in such a way that one of the codons in the pair expresses with high frequency while the other with low frequency. On the other hand the complimentary codon pairs express with small frequency difference, (v) In the synonymous codon groups, codon ending with T are found to express with more frequency  相似文献   

14.
The nucleotide sequence of the structural gene (nifH) of nitrogenase reductase (Fe protein) from R.meliloti 41 with its flanking ends is reported. The amino acid sequence of nitrogenase reductase was deduced from the DNA sequence. The predicted R.meliloti nitrogenase reductase protein consists of 297 amino acid residues, has a molecular weight of 32,740 daltons and contains 5 cysteine residues. The codon usage in the nifH gene is presented. In the 5' flanking region, sequences resembling to consensus sequences of bacterial control regions were found. Comparison of the R.meliloti nifH nucleotide and amino acid sequences with those from different nitrogen-fixing organisms showed that the amino acid sequences are more conserved than the nucleotide sequences. This structural conservation of nitrogenase reductase may be related to its function and may explain the conservation of the nifH gene during evolution.  相似文献   

15.
The complete nucleotide sequence of complementary DNA coding for a variant surface glycoprotein (VSG 117) of Trypanosoma brucei has been determined and compared with amino acid sequence data for the mature protein. This has revealed several interesting and novel features about the synthesis and processing of VSG 117: (1) the primary translation product of the VSG 117 gene includes hydrophobic extensions at both the NH2 and COOH termini that are not found on mature VSG 117; (2) the glycosylated residue at the mature COOH terminus is aspartate, a residue that is not known to be glycosylated in any other system; (3) the nucleotide sequence shows an unusual dinucleotide frequency and codon usage for the gene.  相似文献   

16.
The nucleotide sequence of the adult chicken alpha-globin genes   总被引:25,自引:0,他引:25  
The complete nucleotide sequence is reported of the two adult chicken alpha-globin genes, alpha A and alpha D. These two genes, expressed in a 3:1 ratio, respectively, in adult red cells, are widely divergent, suggesting that they have evolved separately for several hundred million years. Although the genes are closely linked in the chicken chromosome, the nucleotide sequences determined clearly rule out any recent gene conversion events. As expected, both genes contain two relatively short intervening sequences. The 3' intron of the alpha D gene begins with the dinucleotide GC rather than the typical GT. Extensive flanking sequences are reported for both genes. The chromosomal sequences of the two genes are compared to each other and to sequenced mammalian alpha-globin genes.  相似文献   

17.
Summary Vitreoscilla hemoglobin is involved in oxygen metabolism of this bacterium, possibly in an unusual role for a microbe. We have isolated the Vitreoscilla hemoglobin structural gene from a pUC19 genomic library using mixed oligodeoxy-nucleotide probes based on the reported amino acid sequence of the protein. The gene is expressed in Escherichia coli from its natural promoter as a major cellular protein. The nucleotide sequence, which is in complete agrecment with the known amino acid sequence of the protein, suggests the existence of promoter and ribosome binding sites with a high degree of homology to consensus E. coli upstream sequences. In the case of at least some amino acids, a codon usage bias can be detected which is different from the biased codon usage pattern in E. coli. The down-stream sequence exhibits homology with the 3 end sequences of several plant leghemoglobin genes. E. coli cells expressing the gene contain greater than fivefold more heme than controls.  相似文献   

18.
Summary Based on the rates of synonymous substitution in 42 protein-codin gene pairs from rat and human, a correlation is shown to exist between the frequency of the nucleotides in all positions of the codon and the synonymous substitution rate. The correlation coefficients were positive for A and T and negative for C and G. This means that AT-rich genes accumulate more synonymous substitutions than GC-rich genes. Biased patterns of mutation could not account for this phenomenon. Thus, the variation in synonymous substitution rates and the resulting unequal codon usage must be the consequence of selection against A and T in synonymous positions. Most of the varition in rates of synonymous substitution can be explained by the nucleotide composition in synonymous positions. Codon-anticodon interactions, dinucleotide frequencies, and contextual factors influence neither the rates of synonymous substitution nor codon usage. Interestingly, the nucleotide in the second position of codons (always a nonsynonymous position) was found to affect the rate of synonymous substitution. This finding links the rate of nonsynonymous substitution with the synonymous rate. Consequently, highly conservative proteins are expected to be encoded by genes that evolve slowly in terms of synonymous substitutions, and are consequently highly biased in their codon usage.  相似文献   

19.
The nucleotide sequence of the ppc gene, the structural gene for phosphoenolpyruvate carboxylase [EC 4.1.1.31], of Escherichia coli K-12 was determined. The gene codes for a polypeptide comprising 883 amino acid residues with a calculated molecular weight of 99,061. The amino acid sequence deduced from the nucleotide sequence was entirely consistent with the protein chemical data obtained with the purified enzyme, including the NH2- and COOH-terminal sequences and amino acid composition. The coding region is preceded by two putative ribosome binding sites, and is followed closely by a good representative of rho-independent terminator. The codon usage in the ppc gene suggests a moderate expression of the gene. The secondary structure of the enzyme was predicted from the deduced amino acid sequence.  相似文献   

20.
C Grabau  J E Cronan  Jr 《Nucleic acids research》1986,14(13):5449-5460
The entire nucleotide sequence of the poxB (pyruvate oxidase) gene of Escherichia coli K-12 has been determined by the dideoxynucleotide (Sanger) sequencing of fragments of the gene cloned into a phage M13 vector. The gene is 1716 nucleotides in length and has an open reading frame which encodes a protein of Mr 62,018. This open reading frame was shown to encode pyruvate oxidase by alignment of the amino acid sequences deduced for the amino and carboxy termini and several internal segments of the mature protein with sequences obtained by amino acid sequence analysis. The deduced amino acid sequence of the oxidase was not unusually rich in hydrophobic sequences despite the peripheral membrane location and lipid binding properties of the protein. The codon usage of the oxidase gene was typical of a moderately expressed protein. The deduced amino acid sequence shares homology with the large subunits of the acetohydroxy acid synthase isozymes I, II, and III, encoded by the ilvB, ilvG, and ilvI genes of E. coli.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号