首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The question of whether natural selection favors genetic stability or genetic variability is a fundamental problem in evolutionary biology. Bioinformatic analyses demonstrate that selection favors genetic stability by avoiding unstable nucleotide sequences in protein encoding DNA. Yet, such unstable sequences are maintained in several DNA repair genes, thereby promoting breakdown of repair and destabilizing the genome. Several studies have therefore argued that selection favors genetic variability at the expense of stability. Here we propose a new evolutionary mechanism, with supporting bioinformatic evidence, that resolves this paradox. Combining the concepts of gene-dependent mutation biases and meiotic recombination, we argue that unstable sequences in the DNA mismatch repair (MMR) genes are maintained by their own phenotype. In particular, we predict that human MMR maintains an overrepresentation of mononucleotide repeats (monorepeats) within and around the MMR genes. In support of this hypothesis, we report a 31% excess in monorepeats in 250 kb regions surrounding the seven MMR genes compared to all other RefSeq genes (1.75 vs. 1.34%, P = 0.0047), with a particularly high content in PMS2 (2.41%, P = 0.0047) and MSH6 (2.07%, P = 0.043). Based on a mathematical model of monorepeat frequency, we argue that the proposed mechanism may suffice to explain the observed excess of repeats around MMR genes. Our findings thus indicate that unstable sequences in MMR genes are maintained through evolution by the MMR mechanism. The evolutionary paradox of genetically unstable DNA repair genes may thus be explained by an equilibrium in which the phenotype acts back on its own genotype.  相似文献   

2.
A measles virus (MV) genome originally derived from brain cells of a subacute sclerosing panencephalitis patient expressed in IP-3-Ca cells an unstable MV matrix protein and was unable to produce virus particles. Transfection of this MV genome into other cell lines did not relieve these defects, showing that they are ultimately encoded by viral mutations. However, these defects were partially relieved in a weakly infectious virus which emerged from IP-3-Ca cells and which produced a matrix protein of intermediate stability. The sequences of several cDNAs related to the unstable and intermediately stable matrix proteins showed many differences in comparison with a stable matrix protein sequence and even appreciable heterogeneity among themselves. Nevertheless, partial restoration of matrix protein stability could be ascribed to a single additional amino acid change. From an examination of additional genes, we estimated that, on average, each MV genome in IP-3-Ca cells differs from the others in 30 to 40 of its 16,000 bases. The role of extreme variability of RNA virus genomes in persistent viral infections is discussed in the context of the pathogenesis of subacute sclerosing panencephalitis and of other human diseases of suspected viral etiology.  相似文献   

3.
Young ET  Sloan JS  Van Riper K 《Genetics》2000,154(3):1053-1068
The genome of Saccharomyces cerevisiae contains numerous unstable microsatellite sequences. Mononucleotide and dinucleotide repeats are rarely found in ORFs, and when present in an ORF are frequently located in an intron or at the C terminus of the protein, suggesting that their instability is deleterious to gene function. DNA trinucleotide repeats (TNRs) are found at a higher-than-expected frequency within ORFs, and the amino acids encoded by the TNRs represent a biased set. TNRs are rarely conserved between genes with related sequences, suggesting high instability or a recent origin. The genes in which TNRs are most frequently found are related to cellular regulation. The protein structural database is notably lacking in proteins containing amino acid tracts, suggesting that they are not located in structured regions of a protein but are rather located between domains. This conclusion is consistent with the location of amino acid tracts in two protein families. The preferred location of TNRs within the ORFs of genes related to cellular regulation together with their instability suggest that TNRs could have an important role in speciation. Specifically, TNRs could serve as hot spots for recombination leading to domain swapping, or mutation of TNRs could allow rapid evolution of new domains of protein structure.  相似文献   

4.
Polyglutamine repeats within proteins are common in eukaryotes and are associated with neurological diseases in humans. Many are encoded by tandem repeats of the codon CAG that are likely to mutate primarily by replication slippage. However, a recent study in the yeast Saccharomyces cerevisiae has indicated that many others are encoded by mixtures of CAG and CAA which are less likely to undergo slippage. Here we attempt to estimate the proportions of polyglutamine repeats encoded by slippage-prone structures in species currently the subject of genome sequencing projects. We find a general excess over random expectation of polyglutamine repeats encoded by tandem repeats of codons. We nevertheless find many repeats encoded by nontandem codon structures. Mammals and Drosophila display extreme opposite patterns. Drosophila contains many proteins with polyglutamine tracts but these are generally encoded by interrupted structures. These structures may have been selected to be resistant to slippage. In contrast, mammals (humans and mice) have a high proportion of proteins in which repeats are encoded by tandem codon structures. In humans, these include most of the triplet expansion disease genes. Received: 17 August 2000 / Accepted: 20 November 2000  相似文献   

5.
H Li  J Liu  K Wu  Y Chen 《PloS one》2012,7(7):e41167
Glutamine tandem repeats are common in eukaryotic proteins. Although some studies have proposed that replication slippage plays an important role in shaping these repeats, the role of natural selection in glutamine tandem repeat evolution is somewhat unclear. In this study, we identified all of the glutamine tandem repeats containing four or more glutamines in human proteins and then estimated the nonsynonymous (d(N)) and synonymous (d(S)) substitution rates for the regions flanking the glutamine tandem repeats and the proteins containing them. The results indicated that most of the proteins containing polyglutamine (polyQ) tracts of four or more glutamines have undergone purifying selection, and that the purifying selection for the regions flanking the repeats is weaker. Additionally, we observed that the conserved repeats were under stronger selection constraints than the nonconserved repeats. Interestingly, we found that there was a higher level of purifying selection for the regions flanking the polyQ tracts encoded by pure CAG codons compared with those encoded by mixed codons. Based on our findings, we propose that selection has played a more important role than was previously speculated in constraining the expansion of polyQ tracts encoded by pure codons.  相似文献   

6.
Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.  相似文献   

7.
Accurate base-assignment in repeat regions of a whole genome shotgun assembly is an unsolved problem. Since reads in repeat regions cannot be easily attributed to a unique location in the genome, current assemblers may place these reads arbitrarily. As a result, the base-assignment error rate in repeats is likely to be much higher than that in the rest of the genome. We developed an iterative algorithm, EULER-AIR, that is able to correct base-assignment errors in finished genome sequences in public databases. The Wolbachia genome is among the best finished genomes. Using this genome project as an example, we demonstrated that EULER-AIR can 1) discover and correct base-assignment errors, 2) provide accurate read assignments, 3) utilize finishing reads for accurate base-assignment, and 4) provide guidance for designing finishing experiments. In the genome of Wolbachia, EULER-AIR found 16 positions with ambiguous base-assignment and two positions with erroneous bases. Besides Wolbachia, many other genome sequencing projects have significantly fewer finishing reads and, hence, are likely to contain more base-assignment errors in repeats. We demonstrate that EULER-AIR is a software tool that can be used to find and correct base-assignment errors in a genome assembly project  相似文献   

8.
9.
An HJ  Lee D  Lee KH  Bhak J 《BMC genomics》2004,5(1):97-5

Background  

A significant portion (about 8% in the human genome) of mammalian mRNA sequences contains AU (Adenine and Uracil) rich elements or AREs at their 3' untranslated regions (UTR). These mRNA sequences are usually stable. However, an increasing number of observations have been made of unstable species, possibly depending on certain elements such as Alu repeats. ARE motifs are repeats of the tetramer AUUU and a monomer A at the end of the repeats ((AUUU)nA). The importance of AREs in biology is that they make certain mRNA unstable. Proto-oncogene, such as c-fos, c-myc, and c-jun in humans, are associated with AREs. Although it has been known that the increased number of ARE motifs caused the decrease of the half-life of mRNA containing ARE repeats, the exact mechanism is as of yet unknown. We analyzed the occurrences of AREs and Alu and propose a possible mechanism for how human mRNA could acquire and keep AREs at its 3' UTR originating from Alu repeats.  相似文献   

10.
Huntley MA  Golding GB 《Genetics》2004,166(3):1141-1154
Proteins associated with disease and development of the nervous system are thought to contain repetitive, simple sequences. However, genome-wide surveys for simple sequences within proteins have revealed that repetitive peptide sequences are the most frequent shared peptide segments among eukaryotic proteins, including those of Saccharomyces cerevisiae, which has few to no specialized developmental and neurological proteins. It is therefore of interest to determine if these specialized proteins have an excess of simple sequences when compared to other sets of compositionally similar proteins. We have determined the relative abundance of simple sequences within neurological proteins and find no excess of repetitive simple sequence within this class. In fact, polyglutamine repeats that are associated with many neurodegenerative diseases are no more abundant within neurological specialized proteins than within nonneurological collections of proteins. We also examined the codon composition of serine homopolymers to determine what forces may play a role in the evolution of extended homopolymers. Codon type homogeneity tends to be favored, suggesting replicative slippage instead of selection as the main force responsible for producing these homopolymers.  相似文献   

11.
12.
Transcription of a satellite DNA in the newt   总被引:7,自引:0,他引:7       下载免费PDF全文
  相似文献   

13.
Alternative splicing has been recognized as a major mechanism by which protein diversity is increased without significantly increasing genome size in animals and has crucial medical implications, as many alternative splice variants are known to cause diseases. Despite the importance of knowing what structural changes alternative splicing introduces to the encoded proteins for the consideration of its significance, the problem has not been adequately explored. Therefore, we systematically examined the structures of the proteins encoded by the alternative splice variants in the HUGE protein database derived from long (>4 kb) human brain cDNAs. Limiting our analyses to reliable alternative splice junctions, we found alternative splice junctions to have a slight tendency to avoid the interior of SCOP domains and a strong statistically significant tendency to coincide with SCOP domain boundaries. These findings reflect the occurrence of some alternative splicing events that utilize protein structural units as a cassette. However, 50 cases were identified in which SCOP domains are disrupted in the middle by alternative splicing. In six of the cases, insertions are introduced at the molecular surface, presumably affecting protein functions, while in 11 of the cases alternatively spliced variants were found to encode pairs of stable and unstable proteins. The mRNAs encoding such unstable proteins are much less abundant than those encoding stable proteins and tend not to have corresponding mRNAs in non-primate species. We propose that most unstable proteins encoded by alternative splice variants lack normal functions and are an evolutionary dead-end.  相似文献   

14.
Local folding in mRNAs is closely associated w ith biological functions. In this study, we reveal the whole distribution of local thermodynamic stability in the complete genome of the poliovirus P3/Leon/37 and the single-stranded RNA sequences that corresponds to the nucleotide sequence of the complete genome sequence (1 667 867 bp) of Helicobacter pylori (H. pylori) strain 26695. Local thermodynamic stability in the RNA sequences is measured by two standard z -scores, significance score and stability score. To estimate the distribution of thermodynamic stability, a model based on the non-central Student's t distribution has been developed. Significant patterns of extremes that are either much more stable or unstable than expected by chance are detected. Our results indicate that the highly stable and statistically more significant folding regions are predominantly in non-coding sequences in the two genome sequences. Moreover, the highly unstable folding regions, on the contrary, are predominantly in the protein coding sequences of H. pylori. The observed differences across the complete genomic sequences are statistically very significant by a chi2-test. These extreme patterns may be useful in searching for target sequences for long-chain antisense RNA and for locating potential RNA functional elements involved in the regulation of gene expression including translation, mRNA localization and metabolism.  相似文献   

15.
A survey of polypeptides encoded by RNA isolated from the submandibular glands of members of the Muridae (species of Mus and Rattus), in conjunction with cDNA cloning, has identified a class of salivary proteins that we term "spot proteins." Although clearly homologous, these proteins show dramatic differences between species in their polypeptide length. On the basis of the sequence of the corresponding clones, it is inferred that the rat spot 1 protein has a size of 6,370 daltons (Da), whereas that of the inbred mouse spot 1 is 11,603 Da. A second component is expressed in some stocks and strains of Mus, and this spot 2 protein has a size of up to 19,212 Da. The sizes of the corresponding mRNAs show parallel differences, and the variation in the sizes of mRNAs in different species of Mus correlates with the pattern of speciation, the size increasing with increased relatedness to inbred mice. The spot protein sequence comprises three domains: an N-terminal domain rich in hydroxy and acidic amino acids, a central domain consisting of repeats of a 9-amino-acid sequence, and a C-terminal domain that in the mouse is very basic. Variation in the number of repeats largely accounts for the differences in size between the mouse and rat mRNAs and their encoded polypeptides, and the coding sequence appears to have been expanding during speciation in the Muridae. There is extensive divergence in sequence between the mouse and rat mRNAs and their encoded proteins. The pattern of amino acid replacements and nucleotide substitutions is consistent with little, if any, selection constraint on the precise sequence of the spot proteins, suggesting that it is the overall architecture of the molecule, rather than the precise structure, that is important for function. There is strong evidence for a gene conversion event having occurred between the two mouse sequences. Frequent recombination by unequal crossing-over between spot protein coding sequences, if it occurs between active and silent genes, could account not only for the expansion in their size but also for their rapid divergence.  相似文献   

16.
Mitochondria are the site for the citric acid cycle and oxidative phosphorylation (OXPHOS), the final steps of ATP synthesis via cellular respiration. Each mitochondrion contains its own genome; in vertebrates, this is a small, circular DNA molecule that encodes 13 subunits of the multiprotein OXPHOS electron transport complexes. Vertebrate lineages vary dramatically in metabolic rates; thus, functional constraints on mitochondrial‐encoded proteins likely differ, potentially impacting mitochondrial genome evolution. Here, we examine mitochondrial genome evolution in salamanders, which have the lowest metabolic requirements among tetrapods. We show that salamanders experience weaker purifying selection on protein‐coding sequences than do frogs, a comparable amphibian clade with higher metabolic rates. In contrast, we find no evidence for weaker selection against mitochondrial genome expansion in salamanders. Together, these results suggest that different aspects of mitochondrial genome evolution (i.e., nucleotide substitution, accumulation of noncoding sequences) are differently affected by metabolic variation across tetrapod lineages.  相似文献   

17.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

18.
The evolutionary expansion of CAG repeats in human triplet expansion disease genes is intriguing because of their deleterious phenotype. In the past, this expansion has been suggested to reflect a broad genomewide expansion of repeats, which would imply that mutational and evolutionary processes acting on repeats differ between species. Here, we tested this hypothesis by analyzing repeat- and flanking-sequence evolution in 28 repeat-containing genes that had been sequenced in humans and mice and by considering overall lengths and distributions of CAG repeats in the two species. We found no evidence that these repeats were longer in humans than in mice. We also found no evidence for preferential accumulation of CAG repeats in the human genome relative to mice from an analysis of the lengths of repeats identified in sequence databases. We then investigated whether sequence properties, such as base and amino acid composition and base substitution rates, showed any relationship to repeat evolution. We found that repeat-containing genes were enriched in certain amino acids, presumably as the result of selection, but that this did not reflect underlying biases in base composition. We also found that regions near repeats showed higher nonsynonymous substitution rates than the remainder of the gene and lower nonsynonymous rates in genes that contained a repeat in both the human and the mouse. Higher rates of nonsynonymous mutation in the neighborhood of repeats presumably reflect weaker purifying selection acting in these regions of the proteins, while the very low rate of nonsynonymous mutation in proteins containing a CAG repeat in both species presumably reflects a high level of purifying selection. Based on these observations, we propose that the mutational processes giving rise to polyglutamine repeats in human and murine proteins do not differ. Instead, we propose that the evolution of polyglutamine repeats in proteins results from an interplay between mutational processes and selection.  相似文献   

19.
We have used Fragmentation Sequencing logic to analyse the repetition structure of several large human genomic genes. The method, based on a proposed laboratory scheme for DNA sequencing, detects short sequences which are repeated near, but not necessarily adjacent, to each other (cryptically simple DNA). We find a low frequency of such repeats. There is a slight excess of such repeats in introns over exons, and a slight but significant excess in genomic DNA over random DNA, confirming that cryptically simple sequences are over-represented in the genome. The analysis suggests that Fragmentation Sequencing will be a suitable method for sequencing large mammalian genes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号