首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Unbiased estimation of the rates of synonymous and nonsynonymous substitution   总被引:39,自引:0,他引:39  
Summary The current convention in estimating the number of substitutions per synonymous site (K S ) and per nonsynonymous site (K A ) between two protein-coding genes is to count each twofold degenerate site as one-third synonymous and two-thirds nonsynonymous because one of the three possible changes at such a site is synonymous and the other two are nonsynonymous. This counting rule can considerably overestimate theK S value because transitional mutations tend to occur more often than transversional mutations and because most transitional mutations at twofold degenerate sites are synonymous. A new method that gives unbiased estimates is proposed. An application of the new and the old method to 14 pairs of mouse and rat genes shows that the new method gives aK S value very close to the number of substitutions per fourfold degenerate site whereas the old method gives a value 30% higher. Both methods give aK A value close to the number of substitutions per nondegenerate site.  相似文献   

2.
When a population size is reduced, genetic drift may fix slightly deleterious mutations, and an increase in nonsynonymous substitution is expected. It has been suggested that past aridity has seriously affected and decreased the populations of cichlid fishes in Lake Victoria, while geographical studies have shown that the water levels in Lake Tanganyika and Lake Malawi have remained fairly constant. The comparably stable environments in the latter two lakes might have kept the populations of cichlid fishes large enough to remove slightly deleterious mutations. The difference in the stability of cichlid fish population sizes between Lake Victoria and the Lakes Tanganyika and Malawi is expected to have caused differences in the nonsynonymous/synonymous ratio, ω (= dN/dS), of the evolutionary rate. Here, we estimated ω and compared it between the cichlids of the three lakes for 13 mitochondrial protein-coding genes using maximum likelihood methods. We found that the lineages of the cichlids in Lake Victoria had a significantly higher ω for several mitochondrial loci. Moreover, positive selection was indicated for several codons in the mtDNA of the Lake Victoria cichlid lineage. Our results indicate that both adaptive and slightly deleterious molecular evolution has taken place in the Lake Victoria cichlids' mtDNA genes, whose nonsynonymous sites are generally conserved.  相似文献   

3.
Summary We consider selecting both fixed and random effects in a general class of mixed effects models using maximum penalized likelihood (MPL) estimation along with the smoothly clipped absolute deviation (SCAD) and adaptive least absolute shrinkage and selection operator (ALASSO) penalty functions. The MPL estimates are shown to possess consistency and sparsity properties and asymptotic normality. A model selection criterion, called the ICQ statistic, is proposed for selecting the penalty parameters ( Ibrahim, Zhu, and Tang, 2008 , Journal of the American Statistical Association 103, 1648–1658). The variable selection procedure based on ICQ is shown to consistently select important fixed and random effects. The methodology is very general and can be applied to numerous situations involving random effects, including generalized linear mixed models. Simulation studies and a real data set from a Yale infant growth study are used to illustrate the proposed methodology.  相似文献   

4.
An information-based methodology for determining the quality of an alignment of two code sequences is presented. The assumptions involved in the procedure are as follows, (i) The information required to effect the alignment is separable into three categories: location, type and operation detail. The information basis of all three categories must be the same so that the information values obtained may be added together to produce a meaningful total for the entire alignment, (ii) All possible alignments may be expressed as composites of four mutation operations, UR, S, In and D. Two mutations are constrained from occurring at the same site to avoid ambiguity and to render the set of alignments finite, (iii) The character statistics and corresponding estimates of the probabilities of occurrence for mutations are available or at least estimable.In application, one needs to obtain estimates of the distribution of (a) the spacing between mutations, (b) the frequency of the four mutation operations, and (c) the inserted character frequencies and deletion lengths.Some of the constraints on these estimates are described and means, in each case, for obtaining reasonable values are suggested.These requirements are all extremely fundamental in nature and can, in principle, be satisfied biochemically. The greatest potential value of the method, is that these physical quantities may be related in a non-arbitrary way to the complex problem of alignment. The method requires no arbitrary penalty factors and should help to guide geneticists in gathering the necessary data.  相似文献   

5.
The distribution of fitness effects (DFE) of new mutations is of fundamental importance in evolutionary genetics. Recently, methods have been developed for inferring the DFE that use information from the allele frequency distributions of putatively neutral and selected nucleotide polymorphic variants in a population sample. Here, we extend an existing maximum-likelihood method that estimates the DFE under the assumption that mutational effects are unconditionally deleterious, by including a fraction of positively selected mutations. We allow one or more classes of positive selection coefficients in the model and estimate both the fraction of mutations that are advantageous and the strength of selection acting on them. We show by simulations that the method is capable of recovering the parameters of the DFE under a range of conditions. We apply the method to two data sets on multiple protein-coding genes from African populations of Drosophila melanogaster. We use a probabilistic reconstruction of the ancestral states of the polymorphic sites to distinguish between derived and ancestral states at polymorphic nucleotide sites. In both data sets, we see a significant improvement in the fit when a category of positively selected amino acid mutations is included, but no further improvement if additional categories are added. We estimate that between 1% and 2% of new nonsynonymous mutations in D. melanogaster are positively selected, with a scaled selection coefficient representing the product of the effective population size, N(e), and the strength of selection on heterozygous carriers of ~2.5.  相似文献   

6.
Whole-genome duplication (polyploidization) is among the most dramatic mutational processes in nature, so understanding how natural selection differs in polyploids relative to diploids is an important goal. Population genetics theory predicts that recessive deleterious mutations accumulate faster in allopolyploids than diploids due to the masking effect of redundant gene copies, but this prediction is hitherto unconfirmed. Here, we use the cotton genus (Gossypium), which contains seven allopolyploids derived from a single polyploidization event 1–2 Million years ago, to investigate deleterious mutation accumulation. We use two methods of identifying deleterious mutations at the nucleotide and amino acid level, along with whole-genome resequencing of 43 individuals spanning six allopolyploid species and their two diploid progenitors, to demonstrate that deleterious mutations accumulate faster in allopolyploids than in their diploid progenitors. We find that, unlike what would be expected under models of demographic changes alone, strongly deleterious mutations show the biggest difference between ploidy levels, and this effect diminishes for moderately and mildly deleterious mutations. We further show that the proportion of nonsynonymous mutations that are deleterious differs between the two coresident subgenomes in the allopolyploids, suggesting that homoeologous masking acts unequally between subgenomes. Our results provide a genome-wide perspective on classic notions of the significance of gene duplication that likely are broadly applicable to allopolyploids, with implications for our understanding of the evolutionary fate of deleterious mutations. Finally, we note that some measures of selection (e.g., dN/dS, πN/πS) may be biased when species of different ploidy levels are compared.  相似文献   

7.
HANS ELLEGREN 《Molecular ecology》2008,17(21):4586-4596
Genomics profoundly affects most areas of biology, including ecology and evolutionary biology. By examining genome sequences from multiple species, comparative genomics offers new insight into genome evolution and the way natural selection moulds DNA sequence evolution. Functional divergence, as manifested in the accumulation of nonsynonymous substitutions in protein-coding genes, differs among lineages in a manner seemingly related to population size. For example, the ratio of nonsynonymous to synonymous substitution (dN/dS) is higher in apes than in rodents, compatible with Ohta's nearly neutral theory of molecular evolution, which suggests that the fixation of slightly deleterious mutations contributes to protein evolution at an extent negatively correlated with effective population size. While this supports the idea that functional evolution is not necessarily adaptive, comparative genomics is uncovering a role for positive Darwinian selection in 10–40% of all genes in different lineages, estimates that are likely to increase when the addition of more genomes gives increased power. Again, population size seems to matter also in this context, with a higher proportion of fixed amino acid changes representing advantageous mutations in large populations. Genes that are particularly prone to be driven by positive selection include those involved with reproduction, immune response, sensory perception and apoptosis. Genetic innovations are also frequently obtained by the gain or loss of complete gene sequences. Moreover, it is increasingly realized, from comparative genomics, that purifying selection conserves much more than just the protein-coding part of the genome, and this points at an important role for regulatory elements in trait evolution. Finally, genome sequencing using outbred or multiple individuals has provided a wealth of polymorphism data that gives information on population history, demography and marker evolution.  相似文献   

8.
9.

Background

Previous studies in Drosophila and mammals have revealed levels of long non-coding RNAs (lncRNAs) sequence conservation that are intermediate between neutrally evolving and protein-coding sequence. These analyses compared conservation between species that diverged up to 75 million years ago. However, analysis of sequence polymorphisms within a species'' population can provide an understanding of essentially contemporaneous selective constraints that are acting on lncRNAs and can quantify the deleterious effect of mutations occurring within these loci.

Results

We took advantage of polymorphisms derived from the genome sequences of 163 Drosophila melanogaster strains and 174 human individuals to calculate the distribution of fitness effects of single nucleotide polymorphisms occurring within intergenic lncRNAs and compared this to distributions for SNPs present within putatively neutral or protein-coding sequences. Our observations show that in D.melanogaster there is a significant excess of rare frequency variants within intergenic lncRNAs relative to neutrally evolving sequences, whereas selection on human intergenic lncRNAs appears to be effectively neutral. Approximately 30% of mutations within these fruitfly lncRNAs are estimated as being weakly deleterious.

Conclusions

These contrasting results can be attributed to the large difference in effective population sizes between the two species. Our results suggest that while the sequences of lncRNAs will be well conserved across insect species, such loci in mammals will accumulate greater proportions of deleterious changes through genetic drift.  相似文献   

10.
The estimation of mutation rates and relative fitnesses in fluctuation analysis is based on the unrealistic hypothesis that the single-cell times to division are exponentially distributed. Using the classical Luria-Delbrück distribution outside its modelling hypotheses induces an important bias on the estimation of the relative fitness. The model is extended here to any division time distribution. Mutant counts follow a generalization of the Luria-Delbrück distribution, which depends on the mean number of mutations, the relative fitness of normal cells compared to mutants, and the division time distribution of mutant cells. Empirical probability generating function techniques yield precise estimates both of the mean number of mutations and the relative fitness of normal cells compared to mutants. In the case where no information is available on the division time distribution, it is shown that the estimation procedure using constant division times yields more reliable results. Numerical results both on observed and simulated data are reported.  相似文献   

11.
Following cessation of recombination during sex chromosome evolution, the nonrecombining sex chromosome is affected by a number of degenerative forces, possibly resulting in the fixation of deleterious mutations. This might take place because of weak selection against recessive or partly recessive deleterious mutations due to permanent heterozygosity of nonrecombining chromosomes. Furthermore, population genetic processes, such as selective sweeps, background selection, and Muller’s ratchet, result in a reduction in Ne, which increase the likelihood of fixation of deleterious mutations. Theory thus predicts that nonrecombining genes should show increased levels of nonsynonymous (dN) to synonymous substitutions (dS). We tested this in an avian system by estimating the ratio between dN and dS in six gametologous gene pairs located on the Z chromosome and the nonrecombining, female-specific W chromosome. In comparisons, we found a significantly higher dN/dS ratio for the W-linked than the Z-linked copy in three of the investigated genes. In a concatenated alignment of all six genes, the dN/dS ratio was six times higher for W-linked than Z-linked genes. By using human and mouse as outgroup in maximum likelihood analyses, W-linked genes were found to evolve differently compared with their Z-linked gametologues and outgroup sequences. This seems not to be a consequence of functional diversification because dN/dS ratios between gametologous gene copies were consistently low. We conclude that deleterious mutations are accumulating at a high rate on the avian W chromosome, probably as a result of the lack of recombination in this female-specific chromosome. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Deborah Charlesworth]  相似文献   

12.
Keightley PD  Halligan DL 《Genetics》2011,188(4):931-940
Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.  相似文献   

13.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

14.
Despite intensive efforts using linkage and candidate gene approaches, the genetic etiology for the majority of families with a multi-generational breast cancer predisposition is unknown. In this study, we used whole-exome sequencing of thirty-three individuals from 15 breast cancer families to identify potential predisposing genes. Our analysis identified families with heterozygous, deleterious mutations in the DNA repair genes FANCC and BLM, which are responsible for the autosomal recessive disorders Fanconi Anemia and Bloom syndrome. In total, screening of all exons in these genes in 438 breast cancer families identified three with truncating mutations in FANCC and two with truncating mutations in BLM. Additional screening of FANCC mutation hotspot exons identified one pathogenic mutation among an additional 957 breast cancer families. Importantly, none of the deleterious mutations were identified among 464 healthy controls and are not reported in the 1,000 Genomes data. Given the rarity of Fanconi Anemia and Bloom syndrome disorders among Caucasian populations, the finding of multiple deleterious mutations in these critical DNA repair genes among high-risk breast cancer families is intriguing and suggestive of a predisposing role. Our data demonstrate the utility of intra-family exome-sequencing approaches to uncover cancer predisposition genes, but highlight the major challenge of definitively validating candidates where the incidence of sporadic disease is high, germline mutations are not fully penetrant, and individual predisposition genes may only account for a tiny proportion of breast cancer families.  相似文献   

15.
The transition from outcrossing to selfing is predicted to reduce the genome-wide efficacy of selection because of the lower effective population size (Ne) that accompanies this change in mating system. However, strongly recessive deleterious mutations exposed in the homozygous backgrounds of selfers should be under strong purifying selection. Here, we examine estimates of the distribution of fitness effects (DFE) and changes in the magnitude of effective selection coefficients (Nes) acting on mutations during the transition from outcrossing to selfing. Using forward simulations, we investigated the ability of a DFE inference approach to detect the joint influence of mating system and the dominance of deleterious mutations on selection efficacy. We investigated predictions from our simulations in the annual plant Eichhornia paniculata, in which selfing has evolved from outcrossing on multiple occasions. We used range-wide sampling to generate population genomic datasets and identified nonsynonymous and synonymous polymorphisms segregating in outcrossing and selfing populations. We found that the transition to selfing was accompanied by a change in the DFE, with a larger fraction of effectively neutral sites (Nes < 1), a result consistent with the effects of reduced Ne in selfers. Moreover, an increased proportion of sites in selfers were under strong purifying selection (Nes > 100), and simulations suggest that this is due to the exposure of recessive deleterious mutations. We conclude that the transition to selfing has been accompanied by the genome-wide influences of reduced Ne and strong purifying selection against deleterious recessive mutations, an example of purging at the molecular level.  相似文献   

16.
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.  相似文献   

17.
Quantifying the slightly deleterious mutation model of molecular evolution   总被引:14,自引:0,他引:14  
We have attempted to quantify the frequency and effects of slightly deleterious mutations (SDMs), those that have selective effects close to the reciprocal of the effective population size of a species, by comparing the level of selective constraint in protein-coding genes of related species that have different present-day effective population sizes. In our two comparisons, the species with the smaller effective population size showed lower constraint, implying that SDMs had become fixed. The fixation of SDMs was supported by the observation of a higher fraction of radical to conservative amino acid substitutions in species with smaller effective population sizes. The fraction of strongly deleterious mutations (which rarely become fixed) is >70% in most species. Only approximately 10% or fewer of mutations seem to behave as SDMs, but SDMs could comprise a substantial fraction of mutations in protein-coding genes that have a chance of becoming fixed between species.  相似文献   

18.
Knowing the distribution of fitness effects (DFE) of new mutations is important for several topics in evolutionary genetics. Existing computational methods with which to infer the DFE based on DNA polymorphism data have frequently assumed that the DFE can be approximated by a unimodal distribution, such as a lognormal or a gamma distribution. However, if the true DFE departs substantially from the assumed distribution (e.g., if the DFE is multimodal), this could lead to misleading inferences about its properties. We conducted simulations to test the performance of parametric and nonparametric discretized distribution models to infer the properties of the DFE for cases in which the true DFE is unimodal, bimodal, or multimodal. We found that lognormal and gamma distribution models can perform poorly in recovering the properties of the distribution if the true DFE is bimodal or multimodal, whereas discretized distribution models perform better. If there is a sufficient amount of data, the discretized models can detect a multimodal DFE and can accurately infer the mean effect and the average fixation probability of a new deleterious mutation. We fitted several models for the DFE of amino acid-changing mutations using whole-genome polymorphism data from Drosophila melanogaster and the house mouse subspecies Mus musculus castaneus. A lognormal DFE best explains the data for D. melanogaster, whereas we find evidence for a bimodal DFE in M. m. castaneus.  相似文献   

19.
Signaling by the glial cell line-derived neurotrophic factor (GDNF)-RET receptor tyrosine kinase and SPRY1, a RET repressor, is essential for early urinary tract development. Individual or a combination of GDNF, RET and SPRY1 mutant alleles in mice cause renal malformations reminiscent of congenital anomalies of the kidney or urinary tract (CAKUT) in humans and distinct from renal agenesis phenotype in complete GDNF or RET-null mice. We sequenced GDNF, SPRY1 and RET in 122 unrelated living CAKUT patients to discover deleterious mutations that cause CAKUT. Novel or rare deleterious mutations in GDNF or RET were found in six unrelated patients. A family with duplicated collecting system had a novel mutation, RET-R831Q, which showed markedly decreased GDNF-dependent MAPK activity. Two patients with RET-G691S polymorphism harbored additional rare non-synonymous variants GDNF-R93W and RET-R982C. The patient with double RET-G691S/R982C genotype had multiple defects including renal dysplasia, megaureters and cryptorchidism. Presence of both mutations was necessary to affect RET activity. Targeted whole-exome and next-generation sequencing revealed a novel deleterious mutation G443D in GFRα1, the co-receptor for RET, in this patient. Pedigree analysis indicated that the GFRα1 mutation was inherited from the unaffected mother and the RET mutations from the unaffected father. Our studies indicate that 5?% of living CAKUT patients harbor deleterious rare variants or novel mutations in GDNF-GFRα1-RET pathway. We provide evidence for the coexistence of deleterious rare and common variants in genes in the same pathway as a cause of CAKUT and discovered novel phenotypes associated with the RET pathway.  相似文献   

20.
Mitochondrial DNA (mtDNA) variants are widely used in evolutionary genetics as markers for population history and to estimate divergence times among taxa. Inferences of species history are generally based on phylogenetic comparisons, which assume that molecular evolution is clock-like. Between-species comparisons have also been used to estimate the mutation rate, using sites that are thought to evolve neutrally. We directly estimated the mtDNA mutation rate by scanning the mitochondrial genome of Drosophila melanogaster lines that had undergone approximately 200 generations of spontaneous mutation accumulation (MA). We detected a total of 28 point mutations and eight insertion-deletion (indel) mutations, yielding an estimate for the single-nucleotide mutation rate of 6.2 × 10−8 per site per fly generation. Most mutations were heteroplasmic within a line, and their frequency distribution suggests that the effective number of mitochondrial genomes transmitted per female per generation is about 30. We observed repeated occurrences of some indel mutations, suggesting that indel mutational hotspots are common. Among the point mutations, there is a large excess of G→A mutations on the major strand (the sense strand for the majority of mitochondrial genes). These mutations tend to occur at nonsynonymous sites of protein-coding genes, and they are expected to be deleterious, so do not become fixed between species. The overall mtDNA mutation rate per base pair per fly generation in Drosophila is estimated to be about 10× higher than the nuclear mutation rate, but the mitochondrial major strand G→A mutation rate is about 70× higher than the nuclear rate. Silent sites are substantially more strongly biased towards A and T than nonsynonymous sites, consistent with the extreme mutation bias towards A+T. Strand-asymmetric mutation bias, coupled with selection to maintain specific nonsynonymous bases, therefore provides an explanation for the extreme base composition of the mitochondrial genome of Drosophila.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号