期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Amino acid vs. nucleotide characters: challenging preconceived notions

Simmons MP Ochoterena H Freudenstein JV 《Molecular phylogenetics and evolution》2002,24(1):78-90

The 567-terminal analysis of atpB, rbcL, and 18S rDNA was used as an empirical example to test the use of amino acid vs. nucleotide characters for protein-coding genes at deeper taxonomic levels. Nucleotides for atpB and rbcL had 6.5 times the amount of possible synapomorphy as amino acids. Based on parsimony analyses with unordered character states, nucleotides outperformed amino acids for all three measures of phylogenetic signal used (resolution, branch support, and congruence with independent evidence). The nucleotide tree was much more resolved than the amino acid tree, for both large and small clades. Nearly twice the percentage of well-supported clades resolved in the 18S rDNA tree were resolved using nucleotides (91.8%) relative to amino acids (49.2%). The well-supported clades resolved by both character types were much better supported by nucleotides (98.7% vs. 83.8% average jackknife support). The faster evolving nucleotides with a smaller average character-state space outperformed the slower evolving amino acids with a larger average character-state space. Nucleotides outperformed amino acids even with 90% of the terminals deleted. The lack of resolution on the amino acid trees appears to be caused by a lack of congruence among the amino acids, not a lack of replacement substitutions. 相似文献

2.

Nucleotide composition as a driving force in the evolution of retroviruses 总被引：4，自引：0，他引：4

Edward C. Bronson John N. Anderson 《Journal of molecular evolution》1994,38(5):506-532

相似文献

3.

Conflict between Amino Acid and Nucleotide Characters 总被引：5，自引：0，他引：5

Mark P. Simmons Helga Ochoterena John V. Freudenstein 《Cladistics : the international journal of the Willi Hennig Society》2002,18(2):200-206

Slowly evolving characters, such as amino acids and replacement substitutions, have generally been favored over faster evolving characters for inferring phylogenetic relationships. However, amino acids constitute composite characters and, because of the degenerate genetic code, are subject to convergence. Based on an analysis of atpB and rbcL in 567 seed plants, we show that silent substitutions may be more phylogenetically informative than replacement substitutions and that artifacts caused by composite characters and/or convergence cause clades on amino acid trees to conflict with nucleotide trees and independent evidence. These findings indicate that coding nucleotide sequences only as amino acid characters for phylogenetic analysis provides little benefit and may yield misleading results. 相似文献

4.

Relative benefits of amino‐acid,codon, degeneracy,DNA, and purine‐pyrimidine character coding for phylogenetic analyses of exons

下载免费PDF全文

Mark P. Simmons 《植物分类学报：英文版》2017,55(2):85-109

Both traditional as well as 10 more recent methods of coding characters from exons of protein‐coding genes are reviewed. The more recent methods collectively blur the distinction between nucleotide and amino‐acid coding and enable investigators to carefully quantify the effects of different sources of phylogenetic signal as well as their potential biases. Codon models, which explicitly model silent and replacement substitutions, are a major advance and are expected to be broadly useful for simultaneously inferring recent and ancient divergences, unlike amino‐acid coding. Degeneracy coding, wherein ambiguity codes are used to eliminate silent substitutions at the individual‐nucleotide level, has clear advantages over scoring amino‐acid characters. Nucleotide, codon, and amino‐acid models are now directly comparable with easy‐to‐use programs, and widely used phylogenetics programs can analyze partitioned supermatrices that incorporate all three types of model. Therefore, it should become standard practice to test among these alternative model types before conducting parametric phylogenetic analyses. An earlier study of 78 protein‐coding genes from 360 green‐plant plastid genomes is used as an empirical example with which to quantify the relative performance of alternative character‐coding methods using five quantification measures. Codon models were selected as having the best fit to the data, yet were outperformed by nucleotide models for all five quantification measures. Third‐codon positions were found to be an important source of phylogenetic signal and even outperformed analyses of first and second positions for some measures. Degeneracy coding generally performed at least as well as amino‐acid coding and is an arguably more effective alternative. 相似文献

5.

Phylogenetic inference using non‐redundant coding of dependent characters versus alternative approaches for protein‐coding genes

Mark P. Simmons Li‐Bing Zhang Kai F. Müller 《Cladistics : the international journal of the Willi Hennig Society》2011,27(2):186-196

Contemporary molecular phylogenetic analyses often encompass a broad range of taxonomic diversity while maintaining high levels of sampling within each major taxon. To help maximize phylogenetic signal in such studies, one may analyse multiple levels of characters simultaneously. We test the performance of both the original and the modified versions of non‐redundant coding of dependent characters (NRCDC) relative to commonly applied alternative character‐sampling strategies using codon‐based simulations under a range of conditions. Both original and modified NRCDC generally outperformed other character‐sampling strategies that only sampled characters at one level (nucleotides or amino acids) over a broader range of simulation parameters than any of the alternative character‐sampling strategies with respect to both overall success of resolution and averaged overall success of resolution in the parsimony‐based analyses. Based on theoretical considerations and the results of our simulations, we encourage application and further testing of modified NRCDC in parsimony‐based molecular phylogenetic analyses that sample exons of protein‐coding genes. We expect that modified NRCDC will generally increase both accuracy and branch‐support over commonly applied alternative character‐sampling strategies when analysed using the same phylogenetic inference method, particularly in studies that sample both closely and distantly related taxa with clades representing both ancient and recent divergences. © The Willi Hennig Society 2010. 相似文献

6.

On the correlation between composition and site-specific evolutionary rate: implications for phylogenetic inference

Gowri-Shankar V Rattray M 《Molecular biology and evolution》2006,23(2):352-364

Model-based phylogenetic reconstruction methods traditionally assume homogeneity of nucleotide frequencies among sequence sites and lineages. Yet, heterogeneity in base composition is a characteristic shared by most biological sequences. Compositional variation in time, reflected in the compositional biases among contemporary sequences, has already been extensively studied, and its detrimental effects on phylogenetic estimates are known. However, fewer studies have focused on the effects of spatial compositional heterogeneity within genes. We show here that different sites in an alignment do not always share a unique compositional pattern, and we provide examples where nucleotide frequency trends are correlated with the site-specific rate of evolution in RNA genes. Spatial compositional heterogeneity is shown to affect the estimation of evolutionary parameters. With standard phylogenetic methods, estimates of equilibrium frequencies are found to be biased towards the composition observed at fast-evolving sites. Conversely, the ancestral composition estimates of some time-heterogeneous but spatially homogeneous methods are found to be biased towards frequencies observed at invariant and slow-evolving sites. The latter finding challenges the result of a previous study arguing against a hyperthermophilic last universal ancestor from the low apparent G + C content of its rRNA sequences. We propose a new model to account for compositional variation across sites. A Gaussian process prior is used to allow for a smooth change in composition with evolutionary rate. The model has been implemented in the phylogenetic inference software PHASE, and Bayesian methods can be used to obtain the model parameters. The results suggest that this model can accurately capture the observed trends in present-day RNA sequences. 相似文献

7.

Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference

Rosenberg MS Kumar S 《Molecular biology and evolution》2003,20(4):610-621

A major assumption of many molecular phylogenetic methods is the homogeneity of nucleotide frequencies among taxa, which refers to the equality of the nucleotide frequency bias among species. Changes in nucleotide frequency among different lineages in a data set are thought to lead to erroneous phylogenetic inference because unrelated clades may appear similar because of evolutionarily unrelated similarities in nucleotide frequencies. We tested the effects of the heterogeneity of nucleotide frequency bias on phylogenetic inference, along with the interaction between this heterogeneity and stratified taxon sampling, by means of computer simulations using evolutionary parameters derived from genomic databases. We found that the phylogenetic trees inferred from data sets simulated under realistic, observed levels of heterogeneity for mammalian genes were reconstructed with accuracy comparable to those simulated with homogeneous nucleotide frequencies; the results hold for Neighbor-Joining, minimum evolution, maximum parsimony, and maximum-likelihood methods. The LogDet distance method, specifically designed to deal with heterogeneous nucleotide frequencies, does not perform better than distance methods that assume substitution pattern homogeneity among sequences. In these specific simulation conditions, we did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages, even when the taxon sampling is increased. 相似文献

8.

Phylogeny of the Ascaridoidea (Nematoda: Ascaridida) based on three genes and morphology: hypotheses of structural and sequence evolution

Nadler SA Hudspeth DS 《The Journal of parasitology》2000,86(2):380-393

Ascaridoid nematodes parasitize the gastrointestinal tract of vertebrate definitive hosts and are represented by more than 50 described genera. We used 582 nucleotides (83% of the coding sequence) of the mitochondrial gene cytochrome oxidase subunit 2, in combination with published small- and large-subunit nuclear rDNA sequences (2,557 characters) and morphological data (20 characters), to produce a phylogenetic hypothesis for representatives of this superfamily. This combined evidence phylogeny strongly supported clades that, with 1 exception, were consistent with Fagerholm's 1991 classification. Parsimony mapping of character states on the combined evidence tree was used to develop hypotheses for the evolution of morphological, life history, and amino acid characters. This analysis of character evolution revealed that certain key features that have been used by previous workers for developing taxonomic and evolutionary hypotheses represent plesiomorphic states. Cytochrome oxidase subunit 2 nucleotides show a strong compositional bias to A+T and a substitution bias to thymine. These biases are most apparent at third positions of codons and 4-fold degenerate sites, which is consistent with the nonrandom substitution pattern of A+T pressure. Despite nucleotide bias, cytochrome oxidase amino acid sequences show conservation and retention of critical functional residues, as inferred from comparisons to other organisms. 相似文献

9.

The complete mitogenome of Rhodeus uyekii (Cypriniformes, Cyprinidae).

Byong-Chul Kim Tae-Wook Kang Moo-Sang Kim Chang-Bae Kim 《DNA sequence》2006,17(3):181-186

The complete nucleotide sequence of the mitochondrial genome from the R. uyekii with a total size of 16,817 bp has been determined by long PCR technology. Mitogenome of R. uyekii encoding 13 putative proteins, two ribosomal RNAs and 22 tRNAs shows typical teleost mitogenome structure. Nucleotide composition, amino acid composition and codon usage are in the range of values estimated from other teleost mitogenomes. In the AT rich region of R. uyekii, several conserved blocks which are identified from vertebrates are observed in the genome. R. uyekii, the Korean endemic species, belongs to cyprinid fish from which the information of nine mitogenomes is available. To understand the phylogenetic relationships of Cypriniformes from the known mitogenome information, we analysed Cypriniformes mitogenome based on protein coding gene sequences. In spite of more resolved picture of phylogenetic interrelationships in cyprinid fish in this study, the further study with comprehensive taxon sampling for mitogenome information is strongly needed. 相似文献

10.

The partial mitochondrial genome of the Cephalothrix rufifrons (Nemertea, Palaeonemertea): characterization and implications for the phylogenetic position of Nemertea

Turbeville JM Smith DM 《Molecular phylogenetics and evolution》2007,43(3):1056-1065

A continuous 10.1kb fragment of the Cephalothrix rufifrons (Nemertea, Palaeonemertea) mitochondrial genome was sequenced and characterized to further assess organization of protostome mitochondrial genomes and evaluate the phylogenetic potential of gene arrangement and amino acid characters. The genome is A-T rich (72%), and this biased base composition is partly reflected in codon usage. Inferred tRNA secondary structures are typical of those reported for other metazoan mitochondrial DNAs. The arrangement of the 26 genes contained in the fragment exhibits marked similarity to those of many protostome taxa, most notably molluscs with highly conserved arrangements and a phoronid. Separate and simultaneous phylogenetic analyses of inferred amino acid sequences and gene adjacencies place the nemertean within the protostomes among coelomate lophotrochozoan taxa, but do not find a well-supported sister taxon link. 相似文献

11.

Mitochondrial phylogeny of Anura (Amphibia): a case study of congruent phylogenetic reconstruction using amino acid and nucleotide characters 总被引：3，自引：0，他引：3

Gissi C San Mauro D Pesole G Zardoya R 《Gene》2006,366(2):228-237

We explore whether phylogenetic analyses of the same sequence data set at the amino acid and nucleotide level are able to recover congruent topologies, as well as the advantages and limitations of both alternative approaches. As a case study, mitochondrial protein-coding genes were used to discern among competing hypotheses on the phylogenetic relationships of major anuran amphibian lineages. To properly address this phylogenetic question, the complete nucleotide sequences of the mitochondrial genomes of two archaeobatrachian species, Ascaphus truei and Pelobates cultripes, were determined anew. Bayesian and maximum likelihood phylogenetic inferences of the same sequence data set were performed based on both amino acid and nucleotide characters, with the latter analysed either as codons or as a reduced data set of first+second (P12) codon positions. In addition, likelihood-based ratio tests were performed to evaluate the support of alternative topologies. The different data sets arrived at congruent and highly supported topologies, suggesting a similar phylogenetic resolving power of the two character types provided that correctly selected sites and appropriate evolutionary models are used. The reconstructed anuran mitochondrial phylogeny supports the paraphyly of Archaeobatrachia, with Ascaphus as sister group to all the remaining anurans, and Pelobates as sister group of Neobatrachia. However, the employed tree reconstruction methods and likelihood-based ratio tests seemed to be negatively affected by the fast evolving sequences of neobatrachians, suggesting that the phylogeny of Anura here presented is not definitive, and needs further investigation using an extended taxon sampling. 相似文献

12.

Sequence of the T4 recombination gene, uvsX, and its comparison with that of the recA gene of Escherichia coli. 总被引：11，自引：4，他引：7

下载免费PDF全文

H Fujisawa T Yonesaki T Minagawa 《Nucleic acids research》1985,13(20):7473-7481

We have determined the nucleotide sequence of the uvsX gene of bacteriophage T4 which is involved in DNA recombination and damage repair, and whose product catalyzes in vitro reactions related to recombination process in analogous manners to E. coli recA gene product. The coding region consisted of 1170 nucleotides directing the synthesis of a polypeptide of 390 amino acids in length with a calculated molecular weight of 43,760. Amino acid composition, the sequence of seven NH2-terminal amino acids and molecular weight of the protein deduced from the nucleotide sequence were consistent with the data from the analysis of the purified uvsX protein. The nucleotide sequence and the deduced amino acid sequence were compared with those of the recA gene. Although a significant homology was not found in the nucleotide sequences, the amino acid sequences included 23% of identical and 15% of conservatively substituted residues. 相似文献

13.

Compositional Bias May Affect Both DNA-Based and Protein-Based Phylogenetic Reconstructions

Foster PG Hickey DA 《Journal of molecular evolution》1999,48(3):284-290

It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998 相似文献

14.

The position of the Hymenoptera within the Holometabola as inferred from the mitochondrial genome of Perga condei (Hymenoptera: Symphyta: Pergidae)

Castro LR Dowton M 《Molecular phylogenetics and evolution》2005,34(3):469-479

We sequenced most of the mitochondrial genome of the sawfly Perga condei (Insecta: Hymenoptera: Symphyta: Pergidae) and tested different models of phylogenetic reconstruction in order to resolve the position of the Hymenoptera within the Holometabola, using mitochondrial genomes. The mitochondrial genome sequenced for P. condei had less compositional bias and slower rates of molecular evolution than the honeybee, as well as a less rearranged genome organization. Phylogenetic analyses showed that, when using mitochondrial genomes, both adequate taxon sampling and more realistic models of analysis are necessary to resolve relationships among insect orders. Both parsimony and Bayesian analyses performed better when nucleotide instead of amino acid sequences were used. In particular, this study supports the placement of the Hymenoptera as sister group to the Mecopterida. 相似文献

15.

How can third codon positions outperform first and second codon positions in phylogenetic inference? An empirical example from the seed plants

Simmons MP Zhang LB Webb CT Reeves A 《Systematic biology》2006,55(2):245-258

Greater phylogenetic signal is often found in parsimony-based analyses of third codon positions of protein-coding genes relative to their corresponding first and second codon positions, even for early-derived ("basal") clades. We used the Soltis et al. (2000; Bot. J. Linn. Soc. 133:381-461) data matrix of atpB and rbcL from 567 seed plants to quantify how each of six factors (observed character-state space, frequencies of observed character states, substitution probabilities among nucleotides, rate heterogeneity among sites, overall rate of evolution, and number of parsimony-informative characters) contributed to this phenomenon. Each of these six factors was estimated from the original data matrix for parsimony-informative third codon positions considered separately from first and second codon positions combined. One of the most parsimonious trees found was used as the constraint topology; branch lengths were estimated using likelihood-based distances, and characters were simulated on this tree. Differential frequencies of observed character states were found to be the most limiting of the factors simulated for all three codon positions. Differential frequencies of observed character states and differential substitution probabilities among states were relatively advantageous for first and second codon positions. In contrast, differential numbers of observed character states, differential rate heterogeneity among sites, the greater number of parsimony-informative characters, and the higher overall rate of evolution were relatively advantageous for third codon positions. The amount of possible synapomorphy was predictive of the overall success of resolution. 相似文献

16.

Resolving Discrepancy between Nucleotides and Amino Acids in Deep-Level Arthropod Phylogenomics: Differentiating Serine Codons in 21-Amino-Acid Models

Andreas Zwick Jerome C. Regier Derrick J. Zwickl 《PloS one》2012,7(11)

Background

In a previous study of higher-level arthropod phylogeny, analyses of nucleotide sequences from 62 protein-coding nuclear genes for 80 panarthopod species yielded significantly higher bootstrap support for selected nodes than did amino acids. This study investigates the cause of that discrepancy.

Methodology/Principal Findings

The hypothesis is tested that failure to distinguish the serine residues encoded by two disjunct clusters of codons (TCN, AGY) in amino acid analyses leads to this discrepancy. In one test, the two clusters of serine codons (Ser1, Ser2) are conceptually translated as separate amino acids. Analysis of the resulting 21-amino-acid data matrix shows striking increases in bootstrap support, in some cases matching that in nucleotide analyses. In a second approach, nucleotide and 20-amino-acid data sets are artificially altered through targeted deletions, modifications, and replacements, revealing the pivotal contributions of distinct Ser1 and Ser2 codons. We confirm that previous methods of coding nonsynonymous nucleotide change are robust and computationally efficient by introducing two new degeneracy coding methods. We demonstrate for degeneracy coding that neither compositional heterogeneity at the level of nucleotides nor codon usage bias between Ser1 and Ser2 clusters of codons (or their separately coded amino acids) is a major source of non-phylogenetic signal.

Conclusions

The incongruity in support between amino-acid and nucleotide analyses of the forementioned arthropod data set is resolved by showing that “standard” 20-amino-acid analyses yield lower node support specifically when serine provides crucial signal. Separate coding of Ser1 and Ser2 residues yields support commensurate with that found by degenerated nucleotides, without introducing phylogenetic artifacts. While exclusion of all serine data leads to reduced support for serine-sensitive nodes, these nodes are still recovered in the ML topology, indicating that the enhanced signal from Ser1 and Ser2 is not qualitatively different from that of the other amino acids. 相似文献

17.

Human apolipoprotein E mRNA. cDNA cloning and nucleotide sequencing of a new variant 总被引：22，自引：0，他引：22

J W McLean N A Elshourbagy D J Chang R W Mahley J M Taylor 《The Journal of biological chemistry》1984,259(10):6498-6504

The complete nucleotide sequences of three cloned cDNAs corresponding to human liver apolipoprotein E (apo-E) mRNA were determined. Analysis of the longest cDNA showed that it contained 1157 nucleotides of mRNA sequence with a 5'-terminal nontranslated region of 61 nucleotides, a signal peptide region corresponding to 18 amino acids, a mature protein region corresponding to 299 amino acids, and a 3'-terminal nontranslated region of 142 nucleotides. The inferred amino acid sequences from two cDNAs were identical and corresponded to the amino acid sequence for plasma apo-E3 that has been reported previously ( Rall , S. C., Jr., Weisgraber , K. H., and Mahley , R. W. (1982) J. Biol. Chem. 257, 4171-4178). The third cDNA differed from the other two cDNAs in five nucleotide positions. Three of these differences occurred in the third nucleotide position of amino acid codons, resulting in no change in the corresponding amino acids at residues Val-85, Ser-223, and Gln-248. The other two altered nucleotides occurred in the first nucleotide position of codons, leading to changes in the amino acids encoded. In the variant sequence, a threonine replaced the normal alanine at residue 99 and a proline replaced the normal alanine at residue 152. We have concluded that the human liver donor was heterozygous for the epsilon 3 genotype. The variant cDNA corresponds to a new, previously undescribed variant form of apo-E in which the amino acid substitutions of the protein are electrophoretically silent; it would probably be undetectable by standard apo-E phenotyping methods. The amino acid substitution at position 152 occurs in a region of apo-E that appears to be important for receptor binding, and it may have clinical significance. 相似文献

18.

G蛋白Rab3a cDNA的克隆与表达 总被引：2，自引：0，他引：2

康巧华陈巧林季清洲任宏伟茹炳根《中国生物化学与分子生物学报》2002,18(1):75-79

利用PCR法 ,从人胎盘总cDNA中扩增得到Rab3acDNA的全编码区 .序列分析表明 ,扩增得到的Rab3acDNA有 5个核苷酸发生了变异 ,但翻译的氨基酸与发表的完全一致 .将扩增得到的Rab3acDNA克隆于原核融合表达载体pGEX 4T 1中 ,在E .coliBL2 1中经IPTG诱导表达 .为了进一步鉴定表达产物 ,对纯化后的Rab3a蛋白进行了SDS PAGE、N端氨基酸测序、质谱分子量测定及氨基酸组成分析鉴定 .结果显示 ,表达蛋白的分子量约 2 5kD ,N端氨基酸序列为MASATDSR ,氨基酸组成分析表明 ,Rab3a蛋白获得了正确表达相似文献

19.

十字花科植物SAMDC基因同源序列的克隆与进化分析 总被引：5，自引：0，他引：5

丁淑丽卢钢李建勇任彦曹家树《遗传》2007,29(1):109-117

腺苷甲硫氨酸脱羧酶(SAMDC)是参与植物多胺合成的一个关键酶。根据GenBank中已报道的SAMDC基因编码序列保守区域设计特异引物, 运用PCR技术分别从十字花科6属14个物种中新克隆分离了SAMDC基因的同源序列。比较分析结果表明: 这些同源序列的相似性达87%以上, 所推导的氨基酸序列相似性达90%以上, 且两者种间差异分别为0.2%~10.1%和0.3%~6.6%, 属间差异除“圆白”萝卜外分别是4.9%~13.6%和3.1%~10.3%; SAMDC基因的核苷酸及其可能编码的氨基酸序列差异属间较种间大, 可用于属间的分类等级研究; 且氨基酸序列间的差异比核苷酸序列间的差异小的多, 因而根据核苷酸序列构建了NJ与ME分子系统树。进化树直观地表明在亲缘进化关系上芸薹属与萝卜属较近, 其他依次为山芥属、蔊菜属、拟南芥属, 与荠菜属最远。研究结果有助于从分子水平阐明十字花科植物间的亲缘进化关系, 可为其种质资源利用提供理论依据。相似文献

20.

Rat beta casein cDNA: sequence analysis and evolutionary comparisons. 总被引：10，自引：6，他引：4

下载免费PDF全文

D E Blackburn A A Hobbs J M Rosen 《Nucleic acids research》1982,10(7):2295-2307

The complete sequence of a 1072 nucleotide rat beta-casein cDNA insertion in the hybrid plasmid pC beta 23 has been determined. Primer extension was employed to determine the sequence of an additional 82 5'-terminal nucleotides in beta-casein mRNA. Rat beta-casein mRNA consists of a 696 nucleotide coding region, flanked by 52 nucleotide 5' and 406 nucleotide 3' noncoding regions, including a 40 nucleotide poly(A) tail. The derived 216 amino acid sequence of rat beta-casein was compared to the previously determined sequences of beta-caseins from several other species. Approximately 38% of the amino acids have been conserved among the rat, ovine, bovine and human sequences and these conserved amino acids occurred in clusters throughout the protein. One such cluster containing the majority of the potential casein phosphorylation sites was located near the amino terminus. Contrary to the considerable divergence observed for the processed beta-casein, 14 of 15 amino acids in the signal peptide sequence of the precasein were identical between the rat and ovine caseins. 相似文献