首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We assess the similarity of base substitution processes, described by empirically derived 4 × 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC → AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

2.
Mutation and evolution of microsatellite loci in Neurospora   总被引:5,自引:0,他引:5  
Dettman JR  Taylor JW 《Genetics》2004,168(3):1231-1248
The patterns of mutation and evolution at 13 microsatellite loci were studied in the filamentous fungal genus Neurospora. First, a detailed investigation was performed on five microsatellite loci by sequencing each microsatellite, together with its nonrepetitive flanking regions, from a set of 147 individuals from eight species of Neurospora. To elucidate the genealogical relationships among microsatellite alleles, repeat number was mapped onto trees constructed from flanking-sequence data. This approach allowed the potentially convergent microsatellite mutations to be placed in the evolutionary context of the less rapidly evolving flanking regions, revealing the complexities of the mutational processes that have generated the allelic diversity conventionally assessed in population genetic studies. In addition to changes in repeat number, frequent substitution mutations within the microsatellites were detected, as were substitutions and insertion/deletions within the flanking regions. By comparing microsatellite and flanking-sequence divergence, clear evidence of interspecific allele length homoplasy and microsatellite mutational saturation was observed, suggesting that these loci are not appropriate for inferring phylogenetic relationships among species. In contrast, little evidence of intraspecific mutational saturation was observed, confirming the utility of these loci for population-level analyses. Frequency distributions of alleles within species were generally consistent with the stepwise mutational model. By comparing variation within species at the microsatellites and the flanking-sequence, estimated microsatellite mutation rates were approximately 2500 times greater than mutation rates of flanking DNA and were consistent with estimates from yeast and fruit flies. A positive relationship between repeat number and variance in repeat number was significant across three genealogical depths, suggesting that longer microsatellite alleles are more mutable than shorter alleles. To test if the observed patterns of microsatellite variation and mutation could be generalized, an additional eight microsatellite loci were characterized and sequenced from a subset of the same Neurospora individuals.  相似文献   

3.
Abstract The influence of local base composition on mutations in chloroplast DNA (cpDNA) is studied in detail and the resulting, empirically derived, mutation dynamics are used to analyze both base composition and codon usage bias. A 4 × 4 substitution matrix is generated for each of the 16 possible flanking base combinations (contexts) using 17,253 noncoding sites, 1309 of which are variable, from an alignment of three complete grass chloroplast genome sequences. It is shown that substitution bias at these sites is correlated with flanking base composition and that the A+T content of these flanking sites as well as the number of flanking pyrimidines on the same strand appears to have general influences on substitution properties. The context-dependent equilibrium base frequencies predicted from these matrices are then applied to two analyses. The first examines whether or not context dependency of mutations is sufficient to generate average compositional differences between noncoding cpDNA and silent sites of coding sequences. It is found that these two classes of sites exist, on average, in very different contexts and that the observed mutation dynamics are expected to generate significant differences in overall composition bias that are similar to the differences observed in cpDNA. Context dependency, however, cannot account for all of the observed differences: although silent sites in coding regions appear to be at the equilibrium predicted, noncoding cpDNA has a significantly lower A+T content than expected from its own substitution dynamics, possibly due to the influence of indels. The second study examines the codon usage of low-expression chloroplast genes. When context is accounted for, codon usage is very similar to what is predicted by the substitution dynamics of noncoding cpDNA. However, certain codon groups show significant deviation when followed by a purine in a manner suggesting some form of weak selection other than translation efficiency. Overall, the findings indicate that a full understanding of mutational dynamics is critical to understanding the role selection plays in generating composition bias and sequence structure.  相似文献   

4.
A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.  相似文献   

5.
Microsatellites are a major component of the human genome, and their evolution has been much studied. However, the evolution of microsatellite flanking sequences has received less attention, with reports of both high and low mutation rates and of a tendency for microsatellites to cluster. From the human genome we generated a database of many thousands of (AC)n flanking sequences within which we searched for common characteristics. Sequences flanking microsatellites of similar length show remarkable levels of convergent evolution, indicating shared mutational biases. These biases extend 25–50 bases either side of the microsatellite and may therefore affect more than 30% of the entire genome. To explore the extent and absolute strength of these effects, we quantified the observed convergence. We also compared homologous human and chimpanzee loci to look for evidence of changes in mutation rate around microsatellites. Most models of DNA sequence evolution assume that mutations are independent and occur randomly. Allowances may be made for sites mutating at different rates and for general mutation biases such as the faster rate of transitions over transversions. Our analysis suggests that these models may be inadequate, in that proximity to even very short microsatellites may alter the rate and distribution of mutations that occur. The elevated local mutation rate combined with sequence convergence, both of which we find evidence for, also provide a possible resolution for the apparently contradictory inferences of mutation rates in microsatellite flanking sequences.  相似文献   

6.
Exposure to tobacco carcinogens is the major cause of human lung cancer, but even heavy smokers have only about a 10% life-time risk of developing lung cancer. Currently used screening processes, based largely on age and exposure status, have proven to be of limited clinical utility in predicting cancer risk. More precise methods of assessing an individual's risk of developing lung cancer are needed. Because of their sensitivity to DNA damage, microsatellites are potentially useful for the assessment of somatic mutational load in normal cells. We assessed mutational load using hypermutable microsatellites in buccal cells obtained from lung carcinoma cases and controls to test if such a measure could be used to estimate lung cancer risk. There was no significant association between smoking status and mutation frequency with any of the markers tested. No significant association between case status and mutation frequency was observed. Age was significantly related to mutation frequency in the microsatellite marker D7S1482. These observations indicate that somatic mutational load, as measured using mutation frequency of microsatellites in buccal cells, increases with increasing age but that subjects who develop lung cancer have a similar mutational load as those who remain cancer free. This finding suggests that mutation frequency of microsatellite mutations in buccal cells may not be a promising biomarker for lung cancer risk.  相似文献   

7.
Somatic mutations in cancer genomes are associated with DNA replication timing (RT) and chromatin accessibility (CA), however these observations are based on normal tissues and cell lines while primary cancer epigenomes remain uncharacterised. Here we use machine learning to model megabase-scale mutation burden in 2,500 whole cancer genomes and 17 cancer types via a compendium of 900 CA and RT profiles covering primary cancers, normal tissues, and cell lines. CA profiles of primary cancers, rather than those of normal tissues, are most predictive of regional mutagenesis in most cancer types. Feature prioritisation shows that the epigenomes of matching cancer types and organ systems are often the strongest predictors of regional mutation burden, highlighting disease-specific associations of mutational processes. The genomic distributions of mutational signatures are also shaped by the epigenomes of matched cancer and tissue types, with SBS5/40, carcinogenic and unknown signatures most accurately predicted by our models. In contrast, fewer associations of RT and regional mutagenesis are found. Lastly, the models highlight genomic regions with overrepresented mutations that dramatically exceed epigenome-derived expectations and show a pan-cancer convergence to genes and pathways involved in development and oncogenesis, indicating the potential of this approach for coding and non-coding driver discovery. The association of regional mutational processes with the epigenomes of primary cancers suggests that the landscape of passenger mutations is predominantly shaped by the epigenomes of cancer cells after oncogenic transformation.  相似文献   

8.
Morton BR  Bi IV  McMullen MD  Gaut BS 《Genetics》2006,172(1):569-577
We examine variation in mutation dynamics across a single genome (Zea mays ssp. mays) in relation to regional and flanking base composition using a data set of 10,472 SNPs generated by resequencing 1776 transcribed regions. We report several relationships between flanking base composition and mutation pattern. The A + T content of the two sites immediately flanking the mutation site is correlated with rate, transition bias, and GC --> AT pressure. We also observe a significant CpG effect, or increase in transition rate at CpG sites. At the regional level we find that the strength of the CpG effect is correlated with regional A + T content, ranging from a 1.7-fold increase in transition rate in relatively G + C-rich regions to a 2.6-fold increase in A + T-rich regions. We also observe a relationship between locus A + T content and GC --> AT pressure. This regional effect is in opposition to the influence of the two immediate neighbors in that GC --> AT pressure increases with increasing locus A + T content but decreases with increasing flanking base A + T content and may represent a relationship between genome location and mutation bias. The data indicate multiple context effects on mutations, resulting in significant variation in mutation dynamics across the genome.  相似文献   

9.
The role of Fenton oxidants in DNA damage, aging, and cancer is appreciated, but not well understood. Six potential iron-binding (PIB) DNA motifs were previously identified as sites of preferential strand cleavage. Since DNA-metal binding domains are a known determinant of oxidative DNA damage, and the location of strand breaks explains where oxidant attack occurs, we sought to determine whether the likelihood of base change mutations is a function of neighboring PIB motifs. We developed a sliding window function that computes the density of PIB motifs on both strands, within 4-12bp, for each location along a target gene. This range of window sizes reflects known diffusion distances of Fenton reaction products. Using mutational databases, odds of mutation at each base were calculated relative to PIB motif density, for all PIB motif types in aggregate, or for individual PIB motifs. Using mutational data from lacI transgenic animals, we observed a non-random distribution of PIB motifs, associated with increased odds of mutation, showing a strand bias. Sensitivity analysis confirmed that the optimum association between PIB motif density and mutations occurs when a 7bp radius is used for the window size. Randomly simulated mutations showed no association with PIB motif density. When the method was applied to human TP53 mutation data, we saw similar results, but no strand bias. As PIB motif density rises, linear trends are observed for increasing odds of mutation. Sensitivity analysis revealed associations between PIB motifs and GC --> AT transitions and GC --> TA transversions-the most commonly observed types of mutations arising from oxidative DNA damage. DNA-metal binding motifs are found in a wide variety of biological contexts, including many where conformational sensitivity to redox state is important. These techniques can help elucidate how DNA-iron-binding may affect lesions and subsequent mutations from multiple agents.  相似文献   

10.
ABSTRACT: BACKGROUND: Cancer sequencing projects are now measuring somatic mutations in large numbers of cancer genomes. A key challenge in interpreting these data is to distinguish driver mutations, mutations important for cancer development, from passenger mutations that have accumulated in somatic cells but without functional consequences. A common approach to identify genes harboring driver mutations is a single gene test that identifies individual genes that are recurrently mutated in a significant number of cancer genomes. However, the power of this test is reduced by: (1) the necessity of estimating the background mutation rate (BMR) for each gene; (2) the mutational heterogeneity in most cancers meaning that groups of genes (e.g. pathways), rather than single genes, are the primary target of mutations. RESULTS: We investigate the problem of discovering driver pathways, groups of genes containing driver mutations, directly from cancer mutation data and without prior knowledge of pathways or other interactions between genes. We introduce two generative models of somatic mutations in cancer and study the algorithmic complexity of discovering driver pathways in both models. We show that a single gene test for driver genes is highly sensitive to the estimate of the BMR. In contrast, we show that an algorithmic approach that maximizes a straightforward measure of the mutational properties of a driver pathway successfully discovers these groups of genes without an estimate of the BMR. Moreover, this approach is also successful in the case when the observed frequencies of passenger and driver mutations are indistinguishable, a situation where single gene tests fail. CONCLUSIONS: Accurate estimation of the BMR is a challenging task. Thus, methods that do not require an estimate of the BMR, such as the ones we provide here, can give increased power for the discovery of driver genes.  相似文献   

11.
Some aspects of microsatellite evolution, such as the role of base substitutions, are far from being fully understood. To examine the significance of base substitutions underlying the evolution of microsatellites we explored the nature and the distribution of interruptions in dinucleotide repeats from the human genome. The frequencies that we inferred in the repetitive sequences were statistically different from the frequencies observed in other noncoding sequences. Additionally, we detected that the interruptions tended to be towards the ends of the microsatellites and 5'-3' asymmetry. In all the estimates nucleotides forming the same repetitive motif seem to be affected by different base substitution rates in AC and AG. This tendency itself could generate patterning and similarity in flanking sequences and reconcile these phenomena with the high mutation rate found in flanking sequences without invoking convergent evolution. Nevertheless, our data suggest that there is a regional bias in the substitution pattern of microsatellites. The accumulation of random substitutions alone cannot explain the heterogeneity and the asymmetry of interruptions found in this study or the relative frequency of different compound microsatellites in the human genome. Therefore, we cannot rule out the possibility of a mutational bias leading to convergent or parallel evolution in flanking sequences.  相似文献   

12.
Single base substitutions constitute the most frequent type of human gene mutation and are a leading cause of cancer and inherited disease. These alterations occur non-randomly in DNA, being strongly influenced by the local nucleotide sequence context. However, the molecular mechanisms underlying such sequence context-dependent mutagenesis are not fully understood. Using bioinformatics, computational and molecular modeling analyses, we have determined the frequencies of mutation at G•C bp in the context of all 64 5′-NGNN-3′ motifs that contain the mutation at the second position. Twenty-four datasets were employed, comprising >530,000 somatic single base substitutions from 21 cancer genomes, >77,000 germline single-base substitutions causing or associated with human inherited disease and 16.7 million benign germline single-nucleotide variants. In several cancer types, the number of mutated motifs correlated both with the free energies of base stacking and the energies required for abstracting an electron from the target guanines (ionization potentials). Similar correlations were also evident for the pathological missense and nonsense germline mutations, but only when the target guanines were located on the non-transcribed DNA strand. Likewise, pathogenic splicing mutations predominantly affected positions in which a purine was located on the non-transcribed DNA strand. Novel candidate driver mutations and tissue-specific mutational patterns were also identified in the cancer datasets. We conclude that electron transfer reactions within the DNA molecule contribute to sequence context-dependent mutagenesis, involving both somatic driver and passenger mutations in cancer, as well as germline alterations causing or associated with inherited disease.  相似文献   

13.
Cancer is the result of mutagenic processes that can be inferred from tumor genomes by analyzing rate spectra of point mutations, or “mutational signatures”. Here we present SparseSignatures, a novel framework to extract signatures from somatic point mutation data. Our approach incorporates a user-specified background signature, employs regularization to reduce noise in non-background signatures, uses cross-validation to identify the number of signatures, and is scalable to large datasets. We show that SparseSignatures outperforms current state-of-the-art methods on simulated data using a variety of standard metrics. We then apply SparseSignatures to whole genome sequences of pancreatic and breast tumors, discovering well-differentiated signatures that are linked to known mutagenic mechanisms and are strongly associated with patient clinical features.  相似文献   

14.
The nature and extent of mutational pleiotropy remain largely unknown, despite the central role that pleiotropy plays in many areas of biology, including human disease, agricultural production, and evolution. Here, we investigate the variation in 11,604 gene expression traits among 41 mutation accumulation (MA) lines of Drosophila serrata. We first confirmed that these expression phenotypes were heritable, detecting genetic variation in 96% of them in an outbred, natural population of D. serrata. Among the MA lines, 3385 (29%) of expression traits were variable, with a mean mutational heritability of 0.0005. In most traits, variation was generated by mutations of relatively small phenotypic effect; putative mutations with effects of greater than one phenotypic standard deviation were observed for only 8% of traits. With most (71%) traits unaffected by any mutation, our data provide no support for universal pleiotropy. We further characterized mutational pleiotropy in the 3385 variable traits, using sets of 5, randomly assigned, traits. Covariance among traits chosen at random with respect to their biological function is expected only if pleiotropy is extensive. Taking an analytical approach in which the variance unique to each trait in the random 5-trait sets was partitioned from variance shared among traits, we detected significant (at 5% false discovery rate) mutational covariance in 21% of sets. This frequency of statistically supported covariance implied that at least some mutations must pleiotropically affect a substantial number of traits (>70; 0.6% of all measured traits).  相似文献   

15.
The codon bias in Escherichia coli for all two-fold degenerate amino acids was studied as dependent on the context from the six bases in the nearest surrounding codons. By comparing the results in genes at different expression levels, effects that are due to differences in mutation rates can be distinguished from those that are due to selection. Selective effects on the codon bias is found mostly from the first neighbouring base in the 3'direction, while neighbouring bases further away influence mostly the mutational bias. In some cases it is also possible to identify specific molecular processes, repair or avoidance of frame shift, that lead to the context dependence of the bias.  相似文献   

16.
The spectrum of mutations discovered in cancer genomes can be explained by the activity of a few elementary mutational processes. We present a novel probabilistic method, EMu, to infer the mutational signatures of these processes from a collection of sequenced tumors. EMu naturally incorporates the tumor-specific opportunity for different mutation types according to sequence composition. Applying EMu to breast cancer data, we derive detailed maps of the activity of each process, both genome-wide and within specific local regions of the genome. Our work provides new opportunities to study the mutational processes underlying cancer development. EMu is available at http://www.sanger.ac.uk/resources/software/emu/.  相似文献   

17.
Paul Little  Li Hsu  Wei Sun 《Biometrics》2023,79(3):2705-2718
Somatic mutations in cancer patients are inherently sparse and potentially high dimensional. Cancer patients may share the same set of deregulated biological processes perturbed by different sets of somatically mutated genes. Therefore, when assessing the associations between somatic mutations and clinical outcomes, gene-by-gene analysis is often under-powered because it does not capture the complex disease mechanisms shared across cancer patients. Rather than testing genes one by one, an intuitive approach is to aggregate somatic mutation data of multiple genes to assess their joint association with clinical outcomes. The challenge is how to aggregate such information. Building on the optimal transport method, we propose a principled approach to estimate the similarity of somatic mutation profiles of multiple genes between tumor samples, while accounting for gene–gene similarities defined by gene annotations or empirical mutational patterns. Using such similarities, we can assess the associations between somatic mutations and clinical outcomes by kernel regression. We have applied our method to analyze somatic mutation data of 17 cancer types and identified at least five cancer types, where somatic mutations are associated with overall survival, progression-free interval, or cytolytic activity.  相似文献   

18.
A study by Vowles and Amos (2004) identified atypical patterns of base composition around human microsatellites and argued that microsatellites generate mutational biases in their flanking regions. Here, we perform simulations of molecular evolution using a simple model that suggest similar patterns can be produced without any such biases in genome evolution.  相似文献   

19.
A stable RNA helix requires at least three base pairs. Surprisingly, a tertiary kissing complex formed between two GACG hairpin loops contains only two GC pairs. In the NMR structure of this complex, the two flanking adenosines stack on the kissing GC pair. This observation raised a possibility that the 5’-dangling adenines contribute to the formation and stability of the kissing interaction. To test this hypothesis, we took a two-pronged approach to examine the effects of various mutational and chemical modifications of the flanking adenosines on the folding of the kissing complex. Using mass spectrometry, we studied formation of kissing dimers formed by different hairpins. Using optical tweezers, we monitored mechanical unfolding of intramolecular kissing complex at single-molecule level. In both experiments, replacing adenine with uridine abolished the kissing interaction, suggesting that a minimal kissing complex must contain two GC pairs flanked by inter-strand stacking adenines. The stabilizing effect by the adenines can be explained by the fact that the stacking purine nucleobases shield the hydrogen bonds of the adjacent GC pairs, preventing them from fraying. Unlike in the context of secondary structure, the 5’-unpaired adenines in the tertiary structure are structurally constrained in a way that allows for effective stacking onto the adjacent base pairs.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号