首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Compositional evolution of noncoding DNA in the human and chimpanzee genomes   总被引:11,自引:0,他引:11  
We have examined the compositional evolution of noncoding DNA in the primate genome by comparison of lineage-specific substitutions observed in 1.8 Mb of genomic alignments of human, chimpanzee, and baboon with 6542 human single-nucleotide polymorphisms (SNPs) rooted using chimpanzee sequence. The pattern of compositional evolution, measured in terms of the numbers of GC-->AT and AT-->GC changes, differs significantly between fixed and polymorphic sites, and indicates that there is a bias toward fixation of AT-->GC mutations, which could result from weak directional selection or biased gene conversion in favor of high GC content. Comparison of the frequency distributions of a subset of the SNPs revealed no significant difference between GC-->AT and AT-->GC polymorphisms, although AT-->GC polymorphisms in regions of high GC segregate at slightly higher frequencies on average than GC-->AT polymorphisms, which is consistent with a fixation bias favoring high GC in these regions. However, the substitution data suggest that this fixation bias is relatively weak, because the compositional structure of the human and chimpanzee genomes is becoming homogenized, with regions of high GC decreasing in GC content and regions of low GC increasing in GC content. The rate and pattern of nucleotide substitution in 333 Alu repeats within the human-chimpanzee-baboon alignments are not significantly affected by the GC content of the region in which they are inserted, providing further evidence that, since the time of the human-chimpanzee ancestor, there has been little or no regional variation in mutation bias.  相似文献   

2.
According to population genetics models, genomic regions with lower crossing-over rates are expected to experience less effective selection because of Hill-Robertson interference (HRi). The effect of genetic linkage is thought to be particularly important for a selection of weak intensity such as selection affecting codon usage. Consistent with this model, codon bias correlates positively with recombination rate in Drosophila melanogaster and Caenorhabditis elegans. However, in these species, the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination, which suggests that mutation patterns and recombination are associated. To remove this effect of mutation patterns on codon bias, we used the synonymous sites of lowly expressed genes that are expected to be effectively neutral sites. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. In D. melanogaster we find that HRi weakly reduces selection on codon usage of genes located in regions of very low recombination; but these genes only comprise 4% of the total. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. Computer simulations indicate that HRi poorly enhances codon bias if the local recombination rate is greater than the mutation rate. This prediction of the model is consistent with our data and with the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Our results suggest that HRi is a minor determinant of variations in codon bias across the genome.  相似文献   

3.
Lercher MJ  Hurst LD 《Gene》2002,300(1-2):53-58
One of the most abiding controversies in evolutionary biology concerns the role of neutral processes in molecular evolution. A main focus of the debate has been the evolution of isochores, the strong and systematic variation of base composition in mammalian genomes. One set of hypotheses argue that regions of similar GC are owing to localised mutational biases coupled with neutral evolution. The alternatives point to either selection or biased gene conversion as mechanisms to preferentially remove A or T bases, favouring G and C instead. Using a novel method, we compare models including such fixation biases to models based on mutation bias alone, under the assumption that non-coding, non-repetitive human DNA is at compositional equilibrium. While failing to fully explain the allele frequency distributions of recent single nucleotide polymorphism data, we show that the data are best fitted if the mutation bias is assumed to be constant across the genome, while fixation bias varies with GC content. We also attempt to estimate the strength of fixation bias, which increases linearly with increasing GC. Our approximation suggests that this force exists within the necessary parameter range: it is not so weak as to be drowned by random drift, but not so strong as to lead to exclusive use of G and C alone. Together these results demonstrate that mutation bias fails to explain the evolution of isochores, and suggest that either selection or biased gene conversion are involved.  相似文献   

4.
Under neutrality all classes of mutation have an equal probability of becoming fixed in a population. In this article, we describe our analysis of the frequency distributions of >5000 human SNPs and provide evident of biases in the process of fixation of certain classes of point mutation that are most likely to be attributable to biased gene conversion. The results indicate an increased fixation probability of mutations that result in the incorporation of a GC base pair. Furthermore, in transcribed regions this process exhibits strand asymmetry, and is biased towards preserving a G base on the coding strand. Biased gene conversion has the potential to explain both existence of isochores and the compositional asymmetry in mammalian transcribed regions.  相似文献   

5.
Patterns of non-uniform usage of synonymous codons vary across genes in an organism and between species across all domains of life. This codon usage bias (CUB) is due to a combination of non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most models quantify the effects of mutation bias and selection on CUB assuming uniform mutational and other non-adaptive forces across the genome. However, non-adaptive nucleotide biases can vary within a genome due to processes such as biased gene conversion (BGC), potentially obfuscating signals of selection on codon usage. Moreover, genome-wide estimates of non-adaptive nucleotide biases are lacking for non-model organisms. We combine an unsupervised learning method with a population genetics model of synonymous coding sequence evolution to assess the impact of intragenomic variation in non-adaptive nucleotide bias on quantification of natural selection on synonymous codon usage across 49 Saccharomycotina yeasts. We find that in the absence of a priori information, unsupervised learning can be used to identify genes evolving under different non-adaptive nucleotide biases. We find that the impact of intragenomic variation in non-adaptive nucleotide bias varies widely, even among closely-related species. We show that the overall strength and direction of translational selection can be underestimated by failing to account for intragenomic variation in non-adaptive nucleotide biases. Interestingly, genes falling into clusters identified by machine learning are also physically clustered across chromosomes. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable non-adaptive nucleotide biases on codon frequencies.  相似文献   

6.
While genome-era technologies focused on complete genome sequencing in various organisms, post-genome technologies aim at the understanding of the mechanisms of genetic information processing and elucidation of within-species variation. Single nucleotide polymorphisms (SNPs) are the most common source of genome variation in the human population. Nonsynonymous SNPs that occur in coding gene regions and result in amino acid substitutions are of particular interest. It is thought that such SNPs are responsible for phenotypic variation, quantitative traits, and the etiology of common diseases. PolyPhen is a computational tool for the prediction of putatively functional nonsynonymous SNPs by combining information of various types. The application areas of PolyPhen and similar methods include the genetics of complex diseases and congenital defects, the identification of functional mutations in model organisms, and evolutionary genetics.  相似文献   

7.
Singh ND  Arndt PF  Petrov DA 《Genetics》2005,169(2):709-722
Mutation is the underlying force that provides the variation upon which evolutionary forces can act. It is important to understand how mutation rates vary within genomes and how the probabilities of fixation of new mutations vary as well. If substitutional processes across the genome are heterogeneous, then examining patterns of coding sequence evolution without taking these underlying variations into account may be misleading. Here we present the first rigorous test of substitution rate heterogeneity in the Drosophila melanogaster genome using almost 1500 nonfunctional fragments of the transposable element DNAREP1_DM. Not only do our analyses suggest that substitutional patterns in heterochromatic and euchromatic sequences are different, but also they provide support in favor of a recombination-associated substitutional bias toward G and C in this species. The magnitude of this bias is entirely sufficient to explain recombination-associated patterns of codon usage on the autosomes of the D. melanogaster genome. We also document a bias toward lower GC content in the pattern of small insertions and deletions (indels). In addition, the GC content of noncoding DNA in Drosophila is higher than would be predicted on the basis of the pattern of nucleotide substitutions and small indels. However, we argue that the fast turnover of noncoding sequences in Drosophila makes it difficult to assess the importance of the GC biases in nucleotide substitutions and small indels in shaping the base composition of noncoding sequences.  相似文献   

8.
This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.  相似文献   

9.
新一代分子标记--SNPs及其应用   总被引:31,自引:0,他引:31  
邹喻苹  葛颂 《生物多样性》2003,11(5):370-382
单核苷酸多态性(SNPs)是广泛存在于基因组中的一类DNA序列变异,其频率为1%或更高。它是由单个碱基的转换或颠换引起的点突变,稳定而可靠,并通常以二等位基因的形式出现。采用生物芯片和DNA微阵列技术来检测SNP,便于对基因组进行大幅度和高通量分析。因此,作为新一代分子标记,SNP在生物学诸多领域具有广阔应用前景。本文简要叙述SNPs技术的发展历史、研究动态以及相关的理论,介绍了与SNPs相关的基本术语、概念及其特点,列举了发现与检测SNPs主要技术的原理和方法,同时还根据一些具体实例介绍了SNPs在模式动、植物遗传图谱构建、品种鉴定、物种起源与亲缘关系、连锁不平衡与关联分析及其在群体遗传结构及其变化机制研究中的应用。最后展望了SNPs在群体遗传、分子育种和生物进化等研究领域中的应用前景。  相似文献   

10.
Several recent studies of genome evolution indicate that the rate of DNA loss exceeds that of DNA gain, leading to an underlying mutational pressure towards collapsing the length of noncoding DNA. That such a collapse is not observed suggests opposing mechanisms favoring longer noncoding regions. The presence of transposable elements alone also does not explain observed features of noncoding DNA. At present, a multidisciplinary approach--using population genetics techniques, large-scale genomic analyses, and in silico evolution--is beginning to provide new and valuable insights into the forces that shape the length of noncoding DNA and, ultimately, genome size. Recombination, in a broad sense, might be the missing key parameter for understanding the observed variation in length of noncoding DNA in eukaryotes.  相似文献   

11.
Ciliates are unicellular eukaryotes with separate germline and somatic genomes and diverse life cycles, which make them a unique model to improve our understanding of population genetics through the detection of genetic variations. However, traditional sequencing methods cannot be directly applied to ciliates because the majority are uncultivated. Single‐cell whole‐genome sequencing (WGS) is a powerful tool for studying genetic variation in microbes, but no studies have been performed in ciliates. We compared the use of single‐cell WGS and bulk DNA WGS to detect genetic variation, specifically single nucleotide polymorphisms (SNPs), in the model ciliate Tetrahymena thermophila. Our analyses showed that (i) single‐cell WGS has excellent performance regarding mapping rate and genome coverage but lower sequencing uniformity compared with bulk DNA WGS due to amplification bias (which was reproducible); (ii) false‐positive SNP sites detected by single‐cell WGS tend to occur in genomic regions with particularly high sequencing depth and high rate of C:G to T:A base changes; (iii) SNPs detected in three or more cells should be reliable (an detection efficiency of 83.4–97.4% was obtained for combined data from three cells). This analytical method could be adapted to measure genetic variation in other ciliates and broaden research into ciliate population genetics.  相似文献   

12.
We assess the similarity of base substitution processes, described by empirically derived 4 × 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC → AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

13.
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.  相似文献   

14.
15.
Understanding the proximate and ultimate causes underlying the evolution of nucleotide composition in mammalian genomes is of fundamental interest to the study of molecular evolution. Comparative genomics studies have revealed that many more substitutions occur from G and C nucleotides to A and T nucleotides than the reverse, suggesting that mammalian genomes are not at equilibrium for base composition. Analysis of human polymorphism data suggests that mutations that increase GC-content tend to be at much higher frequencies than those that decrease or preserve GC-content when the ancestral allele is inferred via parsimony using the chimpanzee genome. These observations have been interpreted as evidence for a fixation bias in favor of G and C alleles due to either positive natural selection or biased gene conversion. Here, we test the robustness of this interpretation to violations of the parsimony assumption using a data set of 21,488 noncoding single nucleotide polymorphisms (SNPs) discovered by the National Institute of Environmental Health Sciences (NIEHS) SNPs project via direct resequencing of n = 95 individuals. Applying standard nonparametric and parametric population genetic approaches, we replicate the signatures of a fixation bias in favor of G and C alleles when the ancestral base is assumed to be the base found in the chimpanzee outgroup. However, upon taking into account the probability of misidentifying the ancestral state of each SNP using a context-dependent mutation model, the corrected distribution of SNP frequencies for GC-content increasing SNPs are nearly indistinguishable from the patterns observed for other types of mutations, suggesting that the signature of fixation bias is a spurious artifact of the parsimony assumption.  相似文献   

16.
The abundance and identity of functional variation segregating in natural populations is paramount to dissecting the molecular basis of quantitative traits as well as human genetic diseases. Genome sequencing of multiple organisms of the same species provides an efficient means of cataloging rearrangements, insertion, or deletion polymorphisms (InDels) and single-nucleotide polymorphisms (SNPs). While inbreeding depression and heterosis imply that a substantial amount of polymorphism is deleterious, distinguishing deleterious from neutral polymorphism remains a significant challenge. To identify deleterious and neutral DNA sequence variation within Saccharomyces cerevisiae, we sequenced the genome of a vineyard and oak tree strain and compared them to a reference genome. Among these three strains, 6% of the genome is variable, mostly attributable to variation in genome content that results from large InDels. Out of the 88,000 polymorphisms identified, 93% are SNPs and a small but significant fraction can be attributed to recent interspecific introgression and ectopic gene conversion. In comparison to the reference genome, there is substantial evidence for functional variation in gene content and structure that results from large InDels, frame-shifts, and polymorphic start and stop codons. Comparison of polymorphism to divergence reveals scant evidence for positive selection but an abundance of evidence for deleterious SNPs. We estimate that 12% of coding and 7% of noncoding SNPs are deleterious. Based on divergence among 11 yeast species, we identified 1,666 nonsynonymous SNPs that disrupt conserved amino acids and 1,863 noncoding SNPs that disrupt conserved noncoding motifs. The deleterious coding SNPs include those known to affect quantitative traits, and a subset of the deleterious noncoding SNPs occurs in the promoters of genes that show allele-specific expression, implying that some cis-regulatory SNPs are deleterious. Our results show that the genome sequences of both closely and distantly related species provide a means of identifying deleterious polymorphisms that disrupt functionally conserved coding and noncoding sequences.  相似文献   

17.
Drosophila subobscura presents a rich and complex chromosomal inversion polymorphism. It can thus be considered a model system (i) to study the mechanisms originating inversions and how inversions affect the levels and patterns of variation in the inverted regions and (ii) to study adaptation at both the single‐gene and chromosomal inversion levels. It is therefore important to infer its demographic history as previous information indicated that its nucleotide variation is not at mutation–drift equilibrium. For that purpose, we sequenced 16 noncoding regions distributed across those parts of the J chromosome not affected by inversions in the studied population and possibly either by other selective events. The pattern of variation detected in these 16 regions is similar to that previously reported within different chromosomal arrangements, suggesting that the latter results would, thus, mainly reflect recent demographic events rather than the partial selective sweep imposed by the origin and frequency increase of inversions. Among the simple demographic models considered in our Approximate Bayesian Computation analysis of variation at the 16 regions, the model best supported by the data implies a population size expansion soon after the penultimate glacial period. This model constitutes a better null model, and it is therefore an important resource for subsequent studies aiming among others to uncover selective events across the species genome. Our results also highlight the importance of introducing the possibility of multiple hits in the coalescent simulations with an outgroup.  相似文献   

18.
19.

Background

Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.

Results

We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0∼1%.

Conclusions

Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing.  相似文献   

20.
Morton BR  Bi IV  McMullen MD  Gaut BS 《Genetics》2006,172(1):569-577
We examine variation in mutation dynamics across a single genome (Zea mays ssp. mays) in relation to regional and flanking base composition using a data set of 10,472 SNPs generated by resequencing 1776 transcribed regions. We report several relationships between flanking base composition and mutation pattern. The A + T content of the two sites immediately flanking the mutation site is correlated with rate, transition bias, and GC --> AT pressure. We also observe a significant CpG effect, or increase in transition rate at CpG sites. At the regional level we find that the strength of the CpG effect is correlated with regional A + T content, ranging from a 1.7-fold increase in transition rate in relatively G + C-rich regions to a 2.6-fold increase in A + T-rich regions. We also observe a relationship between locus A + T content and GC --> AT pressure. This regional effect is in opposition to the influence of the two immediate neighbors in that GC --> AT pressure increases with increasing locus A + T content but decreases with increasing flanking base A + T content and may represent a relationship between genome location and mutation bias. The data indicate multiple context effects on mutations, resulting in significant variation in mutation dynamics across the genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号