首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Claims of intron-structure correlations have played a major role in debates surrounding split gene origins. In the formative (as opposed to disruptive or "insertional") model of split gene origins, introns represent the scars of chimaeric gene assembly. When analyzed retrospectively, formative introns should tend to fall between modular units, if such units exist, or at least to exhibit a preference for sites favorable to chimaera formation. However, there is another possible source of preferences: under a disruptive model of split gene origins, fortuitous intron-structure correlations may arise because the gain of introns is biased with respect to flanking nucleotide sequences. To investigate the extent to which a sequence-biased intron gain model may account for the present-day distribution of introns, data on over 10,000 introns in eukaryotic protein-coding genes were integrated with structural data from a set of 1,851 nonredundant protein chains. The positions of introns with respect to secondary structures, solvent accessibility, and so-called "modules" were evaluated relative to the expectations of a null model, a disruptive model based on amino acid frequencies at splice junctions, and a formative model defined relative to these. The null model can be excluded for most structural features and is highly improbable when intron sites are grouped by reading frame phase. Phase-dependent correlations with secondary structure and side-chain surface accessibility are particularly strong. However, these phase-dependent correlations are explained largely by the sequence-based disruptive model.  相似文献   

2.
Theories regarding the evolution of spliceosomal introns differ in the extent to which the distribution of introns reflects either a formative role in the evolution of protein-coding genes or the adventitious gain of genetic elements. Here, systematic methods are used to assess the causes of the present-day distribution of introns in 10 families of eukaryotic protein-coding genes comprising 1,868 introns in 488 distinct alignment positions. The history of intron evolution inferred using a probabilistic model that allows ancestral inheritance of introns, gain of introns, and loss of introns reveals that the vast majority of introns in these eukaryotic gene families were not inherited from the most recent common ancestral genes, but were gained subsequently. Furthermore, among inferred events of intron gain that meet strict criteria of reliability, the distribution of sites of gain with respect to reading-frame phase shows a 5:3:2 ratio of phases 0, 1 and 2, respectively, and exhibits a nucleotide preference for MAG GT (positions -3 to +2 relative to the site of gain). The nucleotide preferences of intron gain may prove to be the ultimate cause for the phase bias. The phase bias of intron gain is sufficient to account quantitatively for the well-known 5:3:2 bias in phase frequencies among extant introns, a conclusion that holds even when taxonomic heterogeneity in phase patterns is considered. Thus, intron gain accounts for the vast majority of extant introns and for the bias toward phase 0 introns that previously was interpreted as evidence for ancient formative introns.  相似文献   

3.
The sequence of the apocytochrome b (cob) gene of Neurospora crassa has been determined. The structural gene is interrupted by two intervening sequences of approximately 1260 bp each. The polypeptide encoded by the exons shows extensive homology with the cob proteins of Aspergillus nidulans and Saccharomyces cerevisiae (79% and 60%, respectively). The two introns are, however, located at sites different from those of introns in the cob genes of A. nidulans and S. cerevisiae (which contain highly homologous introns at the same site within the gene). The introns share several short regions of sequence homology (10-12 bp long) with each other and with other fungal mitochondrial introns. Moreover, the second intron contains a 50 nucleotide long sequence that is highly homologous with sequences within every ribosomal intron of fungal mitochondria sequenced to date. The conserved sequences may allow the formation of a core secondary structure, which is nearly identical in many mitochondrial introns. The conserved secondary structure may be required for intron splicing. The second intron contains an open reading frame, continuous with the preceding exon, of approximately 290 codons. Two stretches of 10 amino acid residues, conserved in many introns, are present in the open reading frame.  相似文献   

4.
The Exon/Intron (ExInt) database incorporates information on the exon/intron structure of eukaryotic genes. Features in the database include: intron nucleotide sequence, amino acid sequence of the corresponding protein, position of the introns at the amino acid level and intron phase. From ExInt, we have also generated four additional databases each with ExInt entries containing predicted introns, introns experimentally defined, organelle introns or nuclear introns. ExInt is accessible through a retrieval system with pointers to GenBank. The database can be searched by keywords, locus name, NID, accession number or length of the protein. ExInt is freely accessible at http://intron.bic.nus.edu.sg/exint/exint.html  相似文献   

5.
Overman SA  Thomas GJ 《Biochemistry》1999,38(13):4018-4027
The study of filamentous virus structure by Raman spectroscopy requires accurate band assignments. In previous work, site- and residue-specific isotope substitutions were implemented to elucidate definitive assignments for Raman bands arising from vibrational modes of the alpha-helical coat protein main chain and aromatic side chains in the class I filamentous phage, fd [Overman, S. A., and Thomas, G. J., Jr. (1995) Biochemistry 34, 5440-5451; Overman, S. A., and Thomas, G. J., Jr. (1998) Biochemistry 37, 5654-5665]. Here, we extend the previous methods and expand the assignment scheme to identify Raman markers of nonaromatic side chains of the coat protein in the native fd assembly. This has been accomplished by Raman analysis of 11 different fd isotopomers selectively incorporating deuterium at specific sites in either alanine, aspartic acid, glutamic acid, glycine, isoleucine, leucine, lysine, serine, or valine residues of the coat protein. Raman markers are also identified for the corresponding deuterated side chains. In combination with previous assignments, the results provide a comprehensive understanding of coat protein contributions to the Raman signature of the fd virion and validate Raman markers assigned to the packaged single-stranded DNA genome. The findings described here show that nonaromatic side chains contribute prolifically to the fd Raman signature, that marker bands for specific nonaromatics differ in general from those observed in corresponding polypeptides and amino acids, and that the frequencies and intensities of many nonaromatic markers are sensitive to secondary and higher-order structures. Nonaromatic markers within the 1200-1400 cm-1 interval also interfere seriously with the diagnostic Raman amide III band that is normally exploited in secondary structure analysis. Implications of these findings for the assessment of protein conformation by Raman spectroscopy are considered.  相似文献   

6.
7.
8.
Post-translational lysine methylation and acetylation are two major modifications of lysine residues. They play critical roles in various biological processes, especially in gene regulation. Identification of protein methylation and acetylation sites would be a foundation for understanding their modification dynamics and molecular mechanism. This work presents a method called PLMLA that incorporates protein sequence information, secondary structure and amino acid properties to predict methylation and acetylation of lysine residues in whole protein sequences. We apply an encoding scheme based on grouped weight and position weight amino acid composition to extract sequence information and physicochemical properties around lysine sites. The prediction accuracy for methyllysine and acetyllysine are 83.02% and 83.08%, respectively. Feature analysis reveals that methyllysine is likely to occur at the coil region and acetyllysine prefers to occur at the helix region of protein. The upstream residues away from the central site may be close to methylated lysine in three-dimensional structure and have a significant influence on methyllysine, while the positively charged residues may have a significant influence on acetyllysine. The online service is available at http://bioinfo.ncu.edu.cn/inquiries_PLMLA.aspx.  相似文献   

9.
10.
A few nucleotide sites of nuclear exons that flank introns are often conserved. A hypothesis has suggested that these sites, called "proto-splice sites," are remnants of recognition signals for the insertion of introns in the early evolution of eukaryotic genes. This notion of proto-splice sites has been an important basis for the insertional theory of introns. This hypothesis predicts that the distribution of proto-splice sites would determine the distribution of intron phases, because the positions of introns are just a subset of the proto-splice sites. We previously tested this prediction by examining the proportions of the phases of proto-splice sites, revealing nothing in these proportion distributions similar to observed proportions of intron phases. Here, we provide a second independent test of the proto-splice site hypothesis, with regard to its prediction that the proto-splice sites would mimic intron phase correlations, using a CDS database we created from GenBank. We tested four hypothetical proto-splice sites G / G, AG / G, AG / GT, and C/AAG / R. Interestingly, while G / G and AG / GT site phase distributions are not consistent with actual introns, we observed that AG / G and C/AAG / R sites have a symmetric phase excess. However, the patterns of the excess are quite different from the actual intron phase distribution. In addition, particular amino acid repeats in proteins were found to partially contribute to the excess of symmetry at these two types of sites. The phase associations of all four sites are significantly different from those of intron phases. Furthermore, a general model of intron insertion into proto-splice sites was simulated by Monte Carlo simulation to investigate the probability that the random insertion of introns into AG / G and C/AAG / R sites could generate the observed intron phase distribution. The simulation showed that (1) no observed correlation of intron phases was statistically consistent with the phase distribution of proto-splice sites in the simulated virtual genes; (2) most conservatively, no simulation in 10,000 Monte Carlo experiments gave a pattern with an excess of symmetric (1, 1) exons larger than those of (0, 0) and (2, 2), a major statistical feature of intron phase distribution that is consistent with the directly observed cases of exon shuffling. Thus, these results reject the null hypothesis that introns are randomly inserted into preexisting proto-splice sites, as suggested by the insertional theory of introns.  相似文献   

11.
The DNA sequence of the cob region of the Schizosaccharomyces pombe mitochondrial DNA has been determined. The cytochrome b structural gene is interrupted by an intron of 2526 base-pairs, which has an open reading frame of 2421 base-pairs in phase with the upstream exon. The position of the intron differs from those found in the cob genes of Saccharomyces cerevisiae, Aspergillus nidulans or Neurospora crassa. The Sch. pombe cob intron has the potential of assuming an RNA secondary structure almost identical to that proposed for the first two cox1 introns (group II) in S. cerevisiae and the p1-cox1 intron in Podospora anserina. It has most of the consensus nucleotides in the central core structure described for this group of introns and its comparison with other group II introns allows the identification of an additional conserved nucleotide stretch. A comparison of the predicted protein sequences of group II intronic coding regions reveals three highly conserved blocks showing pairwise amino acid identities of 34 to 53%. These regions comprise over 50% of the coding length of the intron but do not include the 5' region, which has strong secondary structural features. In addition to the potential intron folding, long helical structures involving repetitive sequences can be formed in the flanking cob exon regions. A comparison of the Sch. pombe cytochrome b sequence with those available from other organisms indicates that Sch. pombe is evolutionarily distant from both budding yeasts and filamentous fungi. As was seen for the Sch. pombe cox1 gene (Lang, 1984), the cob exons are translated using the universal genetic code and this distinguishes Sch. pombe mitochondria from all other fungal and animal mitochondrial systems.  相似文献   

12.
Group II introns are ribozymes that catalyze a splicing reaction with the same chemical steps as spliceosome-mediated splicing. Many group II introns have lost the capacity to self-splice while acquiring compensatory interactions with host-derived protein cofactors. Degenerate group II introns are particularly abundant in the organellar genomes of plants, where their requirement for nuclear-encoded splicing factors provides a means for the integration of nuclear and organellar functions. We present a biochemical analysis of the interactions between a nuclear-encoded group II splicing factor and its chloroplast intron target. The maize (Zea mays) protein Chloroplast RNA Splicing 1 (CRS1) is required specifically for the splicing of the group II intron in the chloroplast atpF gene and belongs to a plant-specific protein family defined by a recently recognized RNA binding domain, the CRM domain. We show that CRS1's specificity for the atpF intron in vivo can be explained by CRS1's intrinsic RNA binding properties. CRS1 binds in vitro with high affinity and specificity to atpF intron RNA and does so through the recognition of elements in intron domains I and IV. These binding sites are not conserved in other group II introns, accounting for CRS1's intron specificity. In the absence of CRS1, the atpF intron has little uniform tertiary structure even at elevated [Mg2+]. CRS1 binding reorganizes the RNA, such that intron elements expected to be at the catalytic core become less accessible to solvent. We conclude that CRS1 promotes the folding of its group II intron target through tight and specific interactions with two peripheral intron segments.  相似文献   

13.
Susceptibility to ecotropic murine leukemia viruses (MLV) is restricted to mice and rats at the level of virus binding to the host cell receptor. Asparagine 232, valine 233, tyrosine 235, and glutamic acid 237 in the third extracellular domain (EL3) of the receptor are critical determinants of the host range difference between mice and humans. However, placing these residues in the human homolog confers only partial binding, indicating that other divergent sequences are involved. We sought to determine if the other sequences lie within or outside EL3. Here we report the identification of lysine 234 as another critical residue that influences virus binding and infection, as well as evidence that the unidentified sequences lie outside EL3. Each of the four basic residues in the third extracellular domain were changed to an acidic residue and initially examined in combination with a change at position 235 or position 237. Substitution of lysine 211, 215, or 222 combined with substitution of the critical tyrosine 235 or glutamic acid 237 did not affect virus infection. However, combined substitution of lysine 234, a conserved residue between mice and humans, and tyrosine 235 resulted in a marked decrease in virus infection and binding. A lysine 234 change alone reduced virus binding, contrary to previous observations that at least two of the other four residues must be changed before binding is reduced. Interestingly, there was no decrease in infection when lysine 234 was replaced in combination with glutamic acid 237. This result suggests that residue 234 may act by influencing the local structure of residues 233 to 235, whereas the presence of a glycine at position 236 may prevent this influence from extending to residue 237. With this report, the involvement of all the residues divergent between mice and humans in the third extracellular domain has been ruled out, suggesting that as yet unidentified determinants lie in other extracellular domains.  相似文献   

14.
We have constructed all single base substitutions in almost all of the highly conserved residues of the Tetrahymena self-splicing intron. Mutation of highly conserved residues almost invariably leads to loss of enzymatic activity. In many cases, activity could be regained by making additional mutations that restored predicted base-pairings; these second site suppressors in general confirm the secondary structure derived from phylogenetic data. At several positions, our suppression data can be most readily explained by assuming non-Watson-Crick base-pairings. In addition to the requirements imposed by the secondary structure, the sequence of the intron is constrained by "negative interactions", the exclusion of particular nucleotide sequences that would form undesirable secondary structures. A comparison of genetic and phylogenetic data suggests sites that may be involved in tertiary structural interactions.  相似文献   

15.
Can Codon Usage Bias Explain Intron Phase Distributions and Exon Symmetry?   总被引:1,自引:0,他引:1  
More introns exist between codons (phase 0) than between the first and the second bases (phase 1) or between the second and the third base (phase 2) within the codon. Many explanations have been suggested for this excess of phase 0. It has, for example, been argued to reflect an ancient utility for introns in separating exons that code for separate protein modules. There may, however, be a simple, alternative explanation. Introns typically require, for correct splicing, particular nucleotides immediately 5 in exons (typically a G) and immediately 3 in the following exon (also often a G). Introns therefore tend to be found between particular nucleotide pairs (e.g., G|G pairs) in the coding sequence. If, owing to bias in usage of different codons, these pairs are especially common at phase 0, then intron phase biases may have a trivial explanation. Here we take codon usage frequencies for a variety of eukaryotes and use these to generate random sequences. We then ask about the phase of putative intron insertion sites. Importantly, in all simulated data sets intron phase distribution is biased in favor of phase 0. In many cases the bias is of the magnitude observed in real data and can be attributed to codon usage bias. It is also known that exons may carry either the same phase (symmetric) or different phases (asymmetric) at the opposite ends. We simulated a distribution of different types of exons using frequencies of introns observed in real genes assuming random combination of intron phases at the opposite sides of exons. Surprisingly the simulated pattern was quite similar to that observed. In the simulants we typically observe a prevalence of symmetric exons carrying phase 0 at both ends, which is common for eukaryotic genes. However, at least in some species, the extent of the bias in favor of symmetric (0,0) exons is not as great in simulants as in real genes. These results emphasize the need to construct a biologically relevant null model of successful intron insertion.Reviewing Editor: Dr. Manyuan Long  相似文献   

16.
Intron/exon structure of the chicken pyruvate kinase gene   总被引:15,自引:0,他引:15  
N Lonberg  W Gilbert 《Cell》1985,40(1):81-90
The chicken pyruvate kinase gene is interrupted by at least ten introns, including nine introns within the coding region. We compare the structure of this gene with the three-dimensional protein structure of the homologous cat muscle enzyme. The introns are not randomly placed--they divide the coding sequence into fairly uniformly sized pieces encoding discrete elements of secondary structure. The introns tend to fall at interruptions between stretches of alpha-helix or beta-sheet residues, and each of the six exons that contribute to the barrel-shaped central domain include one or two repeats of a simple unit, an alpha-helix plus a beta strand. This structure suggests that introns were not inserted into a previously uninterrupted coding sequence, but instead are products of the evolution of the first pyruvate kinase gene. We have found some sequence homology between a segment of pyruvate kinase and the structurally homologous mononucleotide binding fold of alcohol dehydrogenase. The superposition of these two regions aligns an intron from the maize alcohol dehydrogenase gene four nucleotides from an intron in the chicken pyruvate kinase gene.  相似文献   

17.
The overlapping ND4L and ND5 genes of Neurospora crassa mitochondria are interrupted by one and two intervening sequences, respectively, of about 1,490, 1,408 and 1,135 bp in length. All three intervening sequences are class I introns and as such have the potential to fold into the conserved secondary structure that has been proposed for the majority of fungal mitochondrial introns. They contain long open reading frames (ORFs; from 306 to 425 codons long) that are continuous and in frame with the upstream exon sequences. These ORFs contain the conserved decapeptide-encoding sequences that are characteristic of the ORFs present in most class I introns. Extensive homology exists among the ORFs encoded by the ND4L intron, ND5 intron 1, and the second intron of the N. crassa oli2 gene. Also, internal repeats of about 130 amino acid residues are present twice in each of these three ORFs, suggesting that a duplication event may have occurred in the formation of these ORFs. The ND4L intron shares extensive homology (at the levels of both primary and proposed secondary structures) with the self-splicing intervening sequence present in the Tetrahymena nuclear rRNA gene. This homology includes but is not limited to the core secondary structure, as peripheral structural elements are also conserved in the two introns.  相似文献   

18.
We report the nucleotide sequence of the chloroplast psbA gene encoding the 32 kilodalton protein of photosystem II from Chlamydomonas moewusii. Like its land plant homologues, this green algal protein consists of 353 amino acids. The C. moewusii psbA gene is composed of three exons containing 252, 11 and 90 codons and of two group I introns containing 2363 and 1807 nucleotides. Each of the introns features an internal open reading frame (ORF) that potentially encodes a basic protein of more than 300 residues. The primary sequences of the putative intron-encoded proteins are unrelated and none of them shares conserved elements with any of the proteins predicted from the group I intron sequences published so far. The first C. moewusii intron is inserted at the same position as the fourth intron of the psbA gene from Chlamydomonas reinhardtii; the second intron lies at a novel site downstream of this position. On the basis of their RNA secondary structures, the C. moewusii introns 1 and 2 can be assigned to subgroups IA and IB, respectively. However, intron 1 is not typical of subgroup IA introns, its most unusual feature being the location of the ORF in the "loop L5" region. To our knowledge, this is the first time that an ORF is located in this region of the group I intron structure.  相似文献   

19.
The analysis of conformations corresponding to continuous amino acid repeat peptides (CARPs) comprising six or more residues in proteins of known three-dimensional structure revealed that alanine, glycine, glutamic acid, proline, valine, histidine, aspartic acid, glutamine and lysine were associated as repeating amino acid residues. Alanine, glycine and histidine CARPs were most common, although the histidine hexapeptide and large CARPs mainly correspond to affinity tags and are not part of the native protein sequence. The Ala and Glu CARPs were observed either as part of helix, or coil or a combination of these conformations. The octapeptide Ala CARP in six-hairpin glycosidases was observed as part of strand and coil conformation. The Gly and Pro CARPs were mainly associated with coil conformation. Majority of the coil regions in CARPs contained beta and gamma-turn structural motifs. The conformations of the Asp, Glu and Lys hexapeptide or larger CARPs were not defined in the corresponding protein three-dimensional structures analyzed. The longest CARP of known conformation was observed for alanine as a decapeptide in a lysozyme-like protein that corresponds to helix. A feature of CARPs is that a majority are exposed to solvent with accessible surface area greater than 200 ?(2) units in the protein three-dimensional structure.  相似文献   

20.
Racemization of the amino acid residues of alpha-melanotropin was measured after exposure of the peptide to alkali for various lengths of time. Rates of racemization were then compared to the rate of transformation by alkali of alpha-melanotropin into a hormone with prolonged melanotropic activity. When in vitro prolongation became maximal, serine, methionine, histidine, phenylalanine and arginine were racemized 50-70%, glutamic acid, tyrosine and tryptophan 30-40% and lysine, proline and valine 10% or less. Racemization of a particular amino acid residue in alpha-melanotropin could not be associated with induction of prolongation of activity. Rather, partial racemization at multiple sites in the molecule seems almost as effective as extensive or total racemization of a single residue in producing a hormone with prolonged biological effects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号