首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sorimachi K  Okayasu T 《Amino acids》2008,34(4):661-668
When nucleotide (G, C, T and A) contents were plotted against each nucleotide, their relationships were clearly expressed by a linear formula, y = αx + β in the coding and non-coding regions. This linear relationship was obtained from the complete single-stranded DNA. Similarly, nucleotide contents at all three codon positions were expressed by linear regression lines based on the content of each nucleotide. In addition, 64 codon usages were also expressed by linear formulas against nucleotide content. Thus, the nucleotide content not only in coding sequence but also in non-coding sequence can be expressed by a linear formula, y = αx + β, in 145 organisms (112 bacteria, 15 archaea and 18 eukaryotes). Based on these results, the ratio of C/T, G/T, C/A or G/A one can essentially estimate all four nucleotide contents in the complete single-stranded DNA, and the determination of any ratio of two kinds of nucleotides can essentially estimate four nucleotide contents, nucleotide contents at the three different codon positions and codon distributions at 64 codons in the coding region. The maximum and minimum values of G content were ∼0.35 and ∼0.15, respectively, among various organisms examined. Codon evolution occurs according to linear formulas between these two values. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

2.
One of the main advantages of de novo gene synthesis is the fact that it frees the researcher from any limitations imposed by the use of natural templates. To make the most out of this opportunity, efficient algorithms are needed to calculate a coding sequence, combining different requirements, such as adapted codon usage or avoidance of restriction sites, in the best possible way. We present an algorithm where a “variation window” covering several amino acid positions slides along the coding sequence. Candidate sequences are built comprising the already optimized part of the complete sequence and all possible combinations of synonymous codons representing the amino acids within the window. The candidate sequences are assessed with a quality function, and the first codon of the best candidates’ variation window is fixed. Subsequently the window is shifted by one codon position. As an example of a freely accessible software implementing the algorithm, we present the Mr. Gene web-application. Additionally two experimental applications of the algorithm are shown.  相似文献   

3.
Abstract

Extensive DNA sequence analysis of three eukaryotes, S. cerevisiae, C. elegans, and D. melanogaster, reveals two different AA/TT periodical patterns associated with the nucleosome positioning. The first pattern is the counter-phase oscillation of AA and TT dinucleotides, which has been frequently considered as the nucleosome DNA pattern. This represents the sequence rule I for chromatin structure. The second pattern is the in-phase oscillation of the AA and TT dinucleotides with the same nucleosome DNA period, 10.4 bases. This pattern apparently corresponds to curved DNA, that also participates in the nucleosome formation, and represents the sequence rule II for chromatin. The positional correlations of AA and TT dinucleotides also indicate that the nucleosomes are separated by specific linker sizes (preferably 8, 18,…bases), dictated by the steric exclusion rules. Thus, the sequence positions of the neighboring nucleosomes are correlated, and this represents the sequence rule III.  相似文献   

4.
Evolution of the cytochromeb gene of mammals   总被引:99,自引:0,他引:99  
Summary With the polymerase chain reaction (PCR) and versatile primers that amplify the whole cytochromeb gene (∼ 1140 bp), we obtained 17 complete gene sequences representing three orders of hoofed mammals (ungulates) and dolphins (cetaceans). The fossil record of some ungulate lineages allowed estimation of the evolutionary rates for various components of the cytochromeb DNA and amino acid sequences. The relative rates of substitution at first, second, and third positions within codons are in the ratio 10 to 1 to at least 33. For deep divergences (>5 million years) it appears that both replacements and silent transversions in this mitochondrial gene can be used for phylogenetic inference. Phylogenetic findings include the association of (1) cetaceans, artiodactyls, and perissodactyls to the exclusion of elephants and humans, (2) pronghorn and fallow deer to the exclusion of bovids (i. e., cow, sheep, and goat), (3) sheep and goat to the exclusion of other pecorans (i. e., cow, giraffe, deer, and pronghorn), and (4) advanced ruminants to the exclusion of the chevrotain and other artiodactyls. Comparisons of these cytochromeb sequences support current structure-function models for this membrane-spanning protein. That part of the outer surface which includes the Qo redox center is more constrained than the remainder of the molecule, namely, the transmembrane segments and the surface that protrudes into the mitochondrial matrix. Many of the amino acid replacements within the transmembrane segments are exchanges between hydrophobic residues (especially leucine, isoleucine, and valine). Replacement changes at first and second positions of codons approximate a negative binomial distribution, similar to other protein-coding sequences. At four-fold degenerate positions of codons, the nucleotide substitutions approximate a Poisson distribution, implying that the underlying mutational spectrum is random with respect to position.  相似文献   

5.
Minicircular plastid DNA in the dinoflagellate Amphidinium operculatum   总被引:1,自引:0,他引:1  
Plastid DNA was purified from the dinoflagellate Amphidinium operculatum. The genes atpB, petD, psaA, psbA and psbB have been shown to reside on single-gene minicircles of a uniform size of 2.3–2.4 kb. The psaA and psbB genes lack conventional initiation codons in the expected positions, and may use GTA for translation initiation. There are marked biases in codon preference. The predicted PsbA protein lacks the C-terminal extension which is present in all other photosynthetic organisms except Euglena gracilis, and there are other anomalies elsewhere in the predicted amino acid sequences. The non-coding regions of the minicircles contain a “core” region which includes a number of stretches that are highly conserved across all minicircles and modular regions that are conserved within subsets of the minicircles. Received: 8 September 1999 / Accepted: 10 November 1999  相似文献   

6.
Summary We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unity slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.  相似文献   

7.
Although most codon third bases appear to be functionless, the synonymous codons so defined exhibit a strikingly nonrandom distribution (codon bias) within human and other genes. To examine this phenomenon further, we generated a database of DNA sequences encoding human transmembrane cell-surface receptor proteins. Using this database we show here that the guanine and cytosine content of codon third bases (GC3) varies intragenically with the nature of the specified receptor domains (transmembrane > extracellular > intracellular domains; p < 0.001), the phenotype of the encoded amino acids (hydrophobic > hydrophilic > neutral amino acids; p < 0.001), and the receptor affiliation of the transmembrane domain superfamily (G-protein- coupled receptors > receptor tyrosine kinases; p < 0.001). Within gene regions specifying transmembrane domains, GC3 declines as domain functionality becomes redundant with increasing hydrophobicity (p < 0.001). Codons containing the second-base cytosine (XCZ, which encodes neutral amino acids) are selectively depleted of third-base adenine content (A3: XCA codons) when encoding transmembrane domain residues, consistent with positive selection for transitional mutation of XCG to XTG (which encodes hydrophobic amino acids) rather than to the synonymous XCA. Supporting this XCG XTG mechanism of codon bias, the G3:A3 ratio of codons specifying the transmembrane amino acid glycine (GGZ) is intermediate between that of its functional homolog alanine (GCZ) and that of hydrophobic valine (GTZ), even though the C3:T3 ratios are similar. Conversely, nearest-neighbor analysis of third bases 5 to codons specifying valine and leucine (CTZ) confirms a significant difference in C3:T3 but not G3:A3 ratios (i.e., C3/G1 T3/G1 > C3/A1; p < 0.001), consistent with the functionally advantageous retention of hydrophobic residues. These data raise the possibility that patterns of intragenic codon bias reflect a balance between negative and positive selection, suggesting in turn that analysis of codon third-base usage may help to predict the functional significance of encoded products. Supplementary information: Current address: (K. Lin) College of Life Sciences, Beijing Normal University, Beijing 100875, China  相似文献   

8.
Codon usage and base composition in sequences from the A + T-rich genome ofRickettsia prowazekii, a member of the alpha Proteobacteria, have been investigated. Synonymous codon usage patterns are roughly similar among genes, even though the data set includes genes expected to be expressed at very different levels, indicating that translational selection has been ineffective in this species. However, multivariate statistical analysis differentiates genes according to their G + C contents at the first two codon positions. To study this variation, we have compared the amino acid composition patterns of 21R. prowazekii proteins with that of a homologous set of proteins fromEscherichia coli. The analysis shows that individual genes have been affected by biased mutation rates to very different extents: genes encoding proteins highly conserved among other species being the least affected. Overall, protein coding and intergenic spacer regions have G + C content values of 32.5% and 21.4%, respectively. Extrapolation from these values suggests thatR. prowazekii has around 800 genes and that 60–70% of the genome may be coding. Correspondence to: S.G.E. Andersson  相似文献   

9.
It has been suggested by Robert Rosen (Bull. Math. Biophysics,22, 227–255, 1960) that multiple alleles or pseudoalleles correspond to multiple cites of degenerate states of some quantum mechanical observable which acts as a source of primary genetic information. It is pointed out here that if the quantum mechanical states are determined by the different sequences of the purine and pyrimidine bases in the DNA molecule, the expected number of pseudoalleles would be much too large. The expected number is considerably reduced if we assume that a quantum mechanical state determines the coupling between a molecule of transfer RNA and the corresponding amino acid.  相似文献   

10.
We analyzed occurrences of bases in 20,352 introns, exons of 25,574 protein-coding genes, and among the three codon positions in the protein-coding sequences. The nucleotide sequences originated from the whole spectrum of organisms from bacteria to primates. The analysis revealed the following: (1) In most exons, adenine dominates over thymine. In other words, adenine and thymine are distributed in an asymmetric way between the exon and the complementary strand, and the coding sequence is mostly located in the adenine-rich strand. (2) Thymine dominates over adenine not only in the strand complementary to the exon but also in introns. (3) A general bias is further revealed in the distribution of adenine and thymine among the three codon positions in the exons, where adenine dominates over thymine in the second and mainly the first codon position while the reverse holds in the third codon position. The product (A1/T1) × (A2/T2) × (T3/A3) is smaller than one in only a few analyzed genes. Correspondence to: J. Kypr  相似文献   

11.
Amino acid residues arginine (R) and lysine (K) have similar physicochemical characteristics and are often mutually substituted during evolution without affecting protein function. Statistical examinations on human proteins show that more R than K residues are used in the proximity of R residues, whereas more K than R are used near K residues. This biased use occurs on both a global and a local scale (shorter than ∼100 residues). Even within a given exon, G + C-rich and A + T-rich short DNA segments preferentially encode R and K, respectively. The biased use of R and K on a local scale is also seen in Saccharomyces cerevisiae and Caenorhabdidtis elegans, which lack global-scale mosaic structures with varying GC%, or isochores. Besides R and K, several amino acids are also used with a positive or negative correlation with the local GC% of third codon bases. The local-, or ``within-gene'-, scale heterogeneity of the DNA sequence may influence the sequence of the encoded protein segment. Received: 2 March 1998 / Accepted: 23 April 1998  相似文献   

12.
This paper analyses the compositional correlations that hold in the chicken genome. Significant linear correlations were found among the regions studied—coding sequences (and their first, second, and third codon positions), flanking regions (5′ and 3′), and introns—as is the case in the human genome. We found that these compositional correlations are not limited to global GC levels but even extend to individual bases. Furthermore, an analysis of 1037 coding sequences has confirmed a correlation among GC3, GC2, and GC1. The implications of these results are discussed. Received: 9 December 1998 / Accepted: 18 April 1999  相似文献   

13.
A novel lectin was isolated and characterized from Bryopsis plumosa (Hudson) Agardh and named BPL-3. This lectin showed specificity to N-acetyl-d-galactosamine as well as N-acetyl-d-glucosamine and agglutinated human erythrocytes of all blood types, showing slight preference to the type A. SDS-PAGE and MALDI-TOF MS data showed that BPL-3 was a monomeric protein with molecular weight of 11.5 kDa. BPL-3 was a non-glycoprotein with pI value of ∼7.0. It was stable in high temperatures up to 70°C and exhibited optimum activity in pH 5.5–10. The N-terminal and internal amino acid sequences of the lectin were determined by Edman degradation and enzymatic digestion, which showed no sequence homology to any other reported proteins. The full sequence of the cDNA encoding this lectin was obtained from PCR using cDNA library, and the degenerate primers were designed from the N-terminal amino acid sequence. The size of the cDNA was 622 bp containing single ORF encoding the lectin precursor. This lectin showed the same sugar specificity to previously reported lectin, Bryohealin, involved in protoplast regeneration of B. plumosa. However, the amino acid sequences of the two lectins were completely different. The homology analysis of the full cDNA sequence of BPL-3 showed that it might belong to H lectin group, which was originally isolated from Roman snails.  相似文献   

14.
Patterns in codon usage were examined for the coding regions of the 23 known lepidopteran hemolymph proteins. Coding triplets are GC rich at the third position and a significant linear relationship between GC content of silent and nonsilent (replacement) sites was demonstrated. Intron GC content was significantly lower than in coding regions and no relationship between intron GC content and the same at silent and nonsilent sites was found. Though hemolymph proteins are all produced by the same tissue—fat body—significantly less bias was observed when all moth sequences were pooled than when sequences of the two major species were analyzed separately, as predicted by the genome hypothesis. In cases where no statistically significant bias was observed, polar or acidic basic amino acids were almost exclusively involved. Calculation of codon adaptation indices (CAI) was of limited value in quantifying the degree of codon bias and probably reflects the complexity of multicellular-organism life cycles and the changing patterns of gene expression over different developmental stages. Correspondence to: D.R. Frohlich  相似文献   

15.
Positional distributions of various dinucleotides in experimentally derived human nucleosome DNA sequences are analyzed. Nucleosome positioning in this species is found to depend largely on GG and CC dinucleotides periodically distributed along the nucleosome DNA sequence, with the period of 10.4 bases. The GG and CC dinucleotides oscillate counterphase, i.e., their respective preferred positions are shifted about a half-period from one another, as it was observed earlier for AA and TT dinucleotides. Other purine-purine and pyrimidine-pyrimidine dinucleotides (RR and YY) display the same periodical and counterphase pattern. The dominance of oscillating GG and CC dinucleotides in human nucleosomes and the contribution of AG(CT), GA(TC), and AA(TT) suggest a general nucleosome DNA sequence pattern - counterphase oscillation of RR and YY dinucleotides. AA and TT dinucleotides, commonly accepted as major players, are only weak contributors in the case of human nucleosomes.  相似文献   

16.
Summary Parsimony trees relating DNA sequences coding for lysozymesc and -lactalbumins suggest that the gene duplication that allowed lactalbumin to evolve from lysozyme preceded the divergence of mammals and birds. Comparisons of the amino acid sequences of additional lysozymes and lactalbumins are consistent with this view. When all base positions are considered, the probability that the duplication leading to the lactalbumin gene occurred after the start to mammalian evolution is estimated to be 0.05–0.10. Elimination of the phylogenetic noise generated by fast evolution and compositional bias at third positions of codons reduced this probability to 0.002–0.03. Thus the gene duplication may have long preceded the acquisition of lactalbumin function.  相似文献   

17.
The complete macronuclear DNA polymerase α gene, previously sequenced in Oxytricha nova, has been cloned from a genomic macronuclear library and sequenced for the hypotrich O. trifallax. Macronuclear DNA clones of DNA polymerase α encoding ∼1000 amino acids, or approximately two-thirds of the open reading frame, have been obtained by PCR and sequenced for Halteria grandinella, Holosticha species, Paraurostyla viridis, Pleurotricha lanceolata, Stylonychia lemnae Teller, Sty. mytilus, Uroleptus gallina, and Urostyla grandis. Phylogenetic relationships inferred from DNA polymerase α amino acid sequences have been used to clarify taxonomic relationships previously determined by morphology of the cell cortex. Hypotrich phylogenies based on DNA polymerase α amino acid sequences are incongruent with morphological and other molecular phylogenies. Based upon these data, we assert that, contrary to morphological data, O. nova and O. trifallax are different species, and we propose that the oligotrich Halteria grandinella be reclassified as a hypotrich. This work also extends the available data base of eukaryotic DNA polymerase α sequences, and suggests new amino acid sequence targets for mutagenesis experiments to continue the functional dissection of DNA pol α biochemistry at the molecular level. Received: 7 January 1997 / Accepted: 7 April 1997  相似文献   

18.
A number of polyketide synthase gene sequences fromAspergillus ochraceus were isolated by both SSH-PCR and degenerate PCR. The deduced amino acid sequences of the corresponding clonedpks DNA fragments were then aligned with the amino acid sequences of other polyketide synthase enzymes. One of thesepks genes is essential for ochratoxin A biosynthesis (OTA-PKS). The OTA-PKS was most similar to methylsalicylic acid synthase (MSAS) type PKS proteins based on the alignment of the ketosynthase domains while if the acyl transferase domains were aligned it appeared to be more similar to PKS enzymes fromCochliobolus heterostrophus. The three PKS proteins identified by degenerate PCR were all from different PKS types, one was a MSAS type enzyme, the second was similar to the PKS proteins involved in lovastatin biosynthesis while the third was not similar to any of the other phylogenetic groupings. Data is presented which suggests that the use of phylogenetic analysis to predict the function of PKS proteins/genes is likely to be significantly enhanced by analyzing more than one domain of the protein. Presented at the EU-USA Bilateral Workshop on Toxigenic Fungi & Mycotoxins, New Orleans, USA, July 5–7, 2005 Financial support: Irish Government under the National Development Plan 2000–2006  相似文献   

19.
Abstract— Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impart information to a hypothesis. These two possibilities are cases of non-independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes fromDrosophila melanogasterto develop a hypothesis of genealogical relationship of these genes in this large multigene family.  相似文献   

20.
InPenaeus vannamei, α-amylase is the most important glucosidase and is present as at least two major isoenzymes which have been purified. In order to obtain information on their structure, a hepatopancreas cDNA library constructed in phage lambda-Zap II (Strategene) was screened using a synthetic oligonucleotide based on the amino acid sequence of a V8 staphylococcal protease peptide ofP. vannamei α-amylase. Three clones were selected: AMY SK 37 (EMBL sequence accession number: X 77318) is the most complete of the analyzed clones and was completely sequenced. It contains the complete cDNA sequence coding for one of the major isoenzymes of shrimp amylase. The deduced amino acid sequence shows the existence of a 511-residue-long pre-enzyme containing a highly hydrophobic signal peptide of 16 amino acids. Northern hybridization of total RNA with the amylase cDNA confirms the size of the messenger at around 1,600 bases. AMY SK 28, which contains the complete mature sequence of amylase, belonged to the same family characterized by a common 3′ terminus and presented four amino acid changes. Some other variants of this family were also partially sequenced. AMY SK 20 was found to encode a minor variant of the protein with a different 3′ terminus and 57 amino acid changes. Phylogenetic analysis established with the conserved amino acid regions of the (β/α) eight-barrel domain and with the total sequence ofP. vannamei showed close evolutionary relationships with mammals (59–63% identity) and with insect α-amylase (52–62% identity). The use of conserved sequences increased the level of similarity but it did not alter the ordering of the groupings. Location of the secondary structure elements confirmed the high level of sequence similarity of shrimp α-amylase with pig α-amylase. Correspondence to: A. Van Wormhoudt  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号