首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 638 毫秒
1.
Mitochondrial D-loop hypervariable region I (HVI) sequences are widely used in human molecular evolutionary studies, and therefore accurate assessment of rate heterogeneity among sites is essential. We used the maximum-likelihood method to estimate the gamma shape parameter alpha for variable substitution rates among sites for HVI from humans and chimpanzees to provide estimates for future studies. The complete data of 839 humans and 224 chimpanzees, as well as many subsets of these data, were analyzed to examine the effect of sequence sampling. The effects of the genealogical tree and the nucleotide substitution model were also examined. The transition/transversion rate ratio (kappa) is estimated to be about 25, although much larger and biased estimates were also obtained from small data sets at low divergences. Estimates of alpha were 0.28-0.39 for human data sets of different sizes and 0.20-0.39 for data sets including different chimpanzee subspecies. The combined data set of both species gave estimates of 0.42-0.45. While all those estimates suggest highly variable substitution rates among sites, smaller samples tend to give smaller estimates of alpha. Possible causes for this pattern were examined, such as biases in the estimation procedure and shifts in the rate distribution along certain lineages. Computer simulations suggest that the estimation procedure is quite reliable for large trees but can be biased for small samples at low divergences. Thus, an alpha of 0.4 appears suitable for both humans and chimpanzees. Estimates of alpha can be affected by the nucleotide sites included in the data, the overall tree length (the amount of sequence divergence), the number of rate classes used for the estimation, and to a lesser extent, the included sequences. The genealogical tree, the substitution model, and demographic processes such as population expansion do not have much effect.  相似文献   

2.
It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.  相似文献   

3.
Previous research has established a discrepancy of nearly anorder of magnitude between pedigree-based and phylogeny-based(human vs. chimpanzee) estimates of the mitochondrial DNA (mtDNA)control region mutation rate. We characterize the time dependencyof the human mitochondrial hypervariable region one mutationrate by generating 14 new phylogeny-based mutation rate estimatesusing within-human comparisons and archaeological dates. Rateestimates based on population events between 15,000 and 50,000years ago are at least 2-fold lower than pedigree-based estimates.These within-human estimates are also higher than estimatesgenerated from phylogeny-based human–chimpanzee comparisons.Our new estimates establish a rapid decay in evolutionary mutationrate between approximately 2,500 and 50,000 years ago and aslow decay from 50,000 to 6 Ma. We then extend this analysisto the mtDNA-coding region. Our within-human coding region mutationrate estimates display a similar, though less rapid, time-dependentdecay. We explore the possibility that multiple hits explainthe discrepancy between pedigree-based and phylogeny-based mutationrates. We conclude that whereas nucleotide substitution modelsincorporating multiple hits do provide a possible explanationfor the discrepancy between pedigree-based and human–chimpanzeemutation rate estimates, they do not explain the rapid declineof within-human rate estimates. We propose that demographicprocesses such as serial bottlenecks prior to the Holocene couldexplain the difference between rates estimated before and after15,000 years ago. Our findings suggest that human mtDNA estimatesof dates of population and phylogenetic events should be adjustedin light of this time dependency of the mutation rate estimates.  相似文献   

4.
We present a stochastic sequence evolution model to obtain alignments and estimate mutation rates between two homologous sequences. The model allows two possible evolutionary behaviors along a DNA sequence in order to determine conserved regions and take its heterogeneity into account. In our model, the sequence is divided into slow and fast evolution regions. The boundaries between these sections are not known. It is our aim to detect them. The evolution model is based on a fragment insertion and deletion process working on fast regions only and on a substitution process working on fast and slow regions with different rates. This model induces a pair hidden Markov structure at the level of alignments, thus making efficient statistical alignment algorithms possible. We propose two complementary estimation methods, namely, a Gibbs sampler for Bayesian estimation and a stochastic version of the EM algorithm for maximum likelihood estimation. Both algorithms involve the sampling of alignments. We propose a partial alignment sampler, which is computationally less expensive than the typical whole alignment sampler. We show the convergence of the two estimation algorithms when used with this partial sampler. Our algorithms provide consistent estimates for the mutation rates and plausible alignments and sequence segmentations on both simulated and real data.  相似文献   

5.
6.
Jiang C  Zhao Z 《Genomics》2006,88(5):527-534
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions.  相似文献   

7.
Since plant mitochondrial genomes exhibit some of the slowest known synonymous substitution rates, it is generally believed that they experience exceptionally low mutation rates. However, the use of synonymous substitution rates to infer mutation rates depends on the implicit assumption that synonymous sites are evolving neutrally (or nearly so). To assess the validity of this assumption in plant mitochondrial genomes, we examined coding sequence for footprints of selection acting at synonymous sites. We found that synonymous sites exhibit an AT rich and pyrimidine skewed nucleotide composition compared to both non-synonymous sites and non-coding regions. We also found some evidence for selection associated with both biased codon usage and conservation of regulatory sequences involved in mRNA processing, although some of these findings are subject to alternative non-adaptive interpretations. Regardless, the inferred strength of selection appears too weak to account for the variation in substitution rates between the mitochondrial genomes of plants and other multicellular eukaryotes. Therefore, these results are consistent with the interpretation that plant mitochondrial genomes experience a substantially lower mutation rate rather than increased functional constraints acting on synonymous sites. Nevertheless, there are important nucleotide composition patterns (particularly the differences between synonymous sites and non-coding DNA) that remain largely unexplained.  相似文献   

8.
Phylogenetic network for European mtDNA   总被引:44,自引:0,他引:44       下载免费PDF全文
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms.  相似文献   

9.
Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.  相似文献   

10.
K D Makova  M Ramsay  T Jenkins  W H Li 《Genetics》2001,158(3):1253-1268
An approximately 6.6-kb region located upstream from the melanocortin 1 receptor (MC1R) gene and containing its promoter was sequenced in 54 humans (18 Africans, 18 Asians, and 18 Europeans) and in one chimpanzee, gorilla, and orangutan. Seventy-six polymorphic sites were found among the human sequences and the average nucleotide diversity (pi) was 0.141%, one of the highest among all studies of nuclear sequence variation in humans. Opposite to the pattern observed in the MC1R coding region, in the present region pi is highest in Africans (0.136%) compared to Asians (0.116%) and Europeans (0.122%). The distributions of pi, theta, and Fu and Li's F-statistic are nonuniform along the sequence and among continents. The pattern of genetic variation is consistent with a population expansion in Africans. We also suggest a possible phase of population size reduction in non-Africans and purifying selection acting in the middle subregion and parts of the 5' subregion in Africans. We hypothesize diversifying selection acting on some sites in the 5' and 3' subregions or in the MC1R coding region in Asians and Europeans, though we cannot reject the possibility of relaxation of functional constraints in the MC1R gene in Asians and Europeans. The mutation rate in the sequenced region is 1.65 x 10(-9) per site per year. The age of the most recent common ancestor for this region is similar to that for the other long noncoding regions studied to date, providing evidence for ancient gene genealogies. Our population screening and phylogenetic footprinting suggest potentially important sites for the MC1R promoter function.  相似文献   

11.
Spatial partitioning methods correct for nonstationarity in spatially related data by partitioning the space into regions of local stationarity. Existing spatial partitioning methods can only estimate linear partitioning boundaries. This is inadequate for detecting an arbitrarily shaped anomalous spatial region within a larger area. We propose a novel Bayesian functional spatial partitioning (BFSP) algorithm, which estimates closed curves that act as partitioning boundaries around anomalous regions of data with a distinct distribution or spatial process. Our method utilizes transitions between a fixed Cartesian and moving polar coordinate system to model the smooth boundary curves using functional estimation tools. Using adaptive Metropolis-Hastings, the BFSP algorithm simultaneously estimates the partitioning boundary and the parameters of the spatial distributions within each region. Through simulation we show that our method is robust to shape of the target zone and region-specific spatial processes. We illustrate our method through the detection of prostate cancer lesions using magnetic resonance imaging.  相似文献   

12.
Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids.  相似文献   

13.
Estimate of the mutation rate per nucleotide in humans   总被引:41,自引:0,他引:41  
Nachman MW  Crowell SL 《Genetics》2000,156(1):297-304
Many previous estimates of the mutation rate in humans have relied on screens of visible mutants. We investigated the rate and pattern of mutations at the nucleotide level by comparing pseudogenes in humans and chimpanzees to (i) provide an estimate of the average mutation rate per nucleotide, (ii) assess heterogeneity of mutation rate at different sites and for different types of mutations, (iii) test the hypothesis that the X chromosome has a lower mutation rate than autosomes, and (iv) estimate the deleterious mutation rate. Eighteen processed pseudogenes were sequenced, including 12 on autosomes and 6 on the X chromosome. The average mutation rate was estimated to be approximately 2.5 x 10(-8) mutations per nucleotide site or 175 mutations per diploid genome per generation. Rates of mutation for both transitions and transversions at CpG dinucleotides are one order of magnitude higher than mutation rates at other sites. Single nucleotide substitutions are 10 times more frequent than length mutations. Comparison of rates of evolution for X-linked and autosomal pseudogenes suggests that the male mutation rate is 4 times the female mutation rate, but provides no evidence for a reduction in mutation rate that is specific to the X chromosome. Using conservative calculations of the proportion of the genome subject to purifying selection, we estimate that the genomic deleterious mutation rate (U) is at least 3. This high rate is difficult to reconcile with multiplicative fitness effects of individual mutations and suggests that synergistic epistasis among harmful mutations may be common.  相似文献   

14.
N G Smith  L D Hurst 《Genetics》1999,152(2):661-673
Miyata et al. have suggested that the male-to-female mutation rate ratio (alpha) can be estimated by comparing the neutral substitution rates of X-linked (X), Y-linked (Y), and autosomal (A) genes. Rodent silent site X/A comparisons provide very different estimates from X/Y comparisons. We examine three explanations for this discrepancy: (1) statistical biases and artifacts, (2) nonneutral evolution, and (3) differences in mutation rate per germline replication. By estimating errors and using a variety of methodologies, we tentatively reject explanation 1. Our analyses of patterns of codon usage, synonymous rates, and nonsynonymous rates suggest that silent sites in rodents are evolving neutrally, and we can therefore reject explanation 2. We find both base composition and methylation differences between the different sets of chromosomes, a result consistent with explanation 3, but these differences do not appear to explain the observed discrepancies in estimates of alpha. Our finding of significantly low synonymous substitution rates in genomically imprinted genes suggests a link between hemizygous expression and an adaptive reduction in the mutation rate, which is consistent with explanation 3. Therefore our results provide circumstantial evidence in favor of the hypothesis that the discrepancies in estimates of alpha are due to differences in the mutation rate per germline replication between different parts of the genome. This explanation violates a critical assumption of the method of Miyata et al., and hence we suggest that estimates of alpha, obtained using this method, need to be treated with caution.  相似文献   

15.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein-coding sequences and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent or CpG "hotspot") models and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection strengths depend on local protein structure, showing expected biophysical trends in helical versus nonhelical regions and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.  相似文献   

16.
Some of the assumptions underlying estimates of DNA and protein sequence divergence are examined. A solution for the variance of these estimates that allows for different mutation rates and different population sizes in each species and for an arbitrary structure in the initial population is obtained. It is shown that these conditions do not strongly affect estimates of divergence. In general, they cause the variance of divergence to be smaller than a binomial variance. Thus, the binomial variance that is usually assumed for these estimates is safely conservative. It is shown that variability in the mutation rate among sites can have an effect as large as or larger than variability in the mutation rate among bases. Variability in the mutation rate among bases and among sites causes the number of substitutions between two sequences to be underestimated. Protein and DNA sequences from several species are collected to estimate the variability in mutation rates among sites. When many homologous sequences are known, standard methods to estimate this variability can be used. The estimates of this variability show that this factor is important when considering the spectrum of spontaneous mutations and is strongly reflected in the divergence of sequences. Smaller variability is found for the third position of codons than for the first and second codon positions. This may be because of less selective constraints on this position or because the third position has been saturated with mutations for the sequences examined.   相似文献   

17.
We have analyzed nucleotide sequence variation in an approximately 900-base pair region of the human mitochondrial DNA molecule encompassing the heavy strand origin of replication and the D-loop. Our analysis has focused on nucleotide sequences available from seven humans. Average nucleotide diversity among the sequences is 1.7%, several-fold higher than estimates from restriction endonuclease site variation in mtDNA from these individuals and previously reported for other humans. This disparity is consistent with the rapidly evolving nature of this noncoding region. However, several instances of convergent or parallel gain and loss of restriction sites due to multiple substitutions were observed. In addition, other results suggest that restriction site (as well as pairwise sequence) comparisons may underestimate the total number of substitutions that have occurred since the divergence of two mtDNA sequences from a common ancestral sequence, even at low levels of divergence. This emphasizes the importance of recognizing the large standard errors associated with estimates of sequence variability, particularly when constructing phylogenies among closely related sequences. Analysis of the observed number and direction of substitutions revealed several significant biases, most notably a strand dependence of substitution type and a 32-fold bias favoring transitions over transversions. The results also revealed a significantly nonrandom distribution of nucleotide substitutions and sequence length variation. Significantly more multiple substitutions were observed than expected for these closely related sequences under the assumption of uniform rates of substitution. The bias for transitions has resulted in predominantly convergent or parallel changes among the observed multiple substitutions. There is no convincing evidence that recombination has contributed to the mtDNA sequence diversity we have observed.  相似文献   

18.
It is now widely accepted that sites in a protein do not undergo independent evolutionary processes. The underlying assumption is that proteins are composed of conserved and variable linear domains, and thus rates at neighboring sites are correlated. In this paper, we comprehensively examine the performance of an autocorrelation model of evolutionary rates in protein sequences. We further develop a model in which the level of correlation between rates at adjacent sites is not equal at all sites of the protein. High correlation is expected, for example, in linear functional domains. On the other hand, when we consider nonlinear functional regions (e.g., active sites), low correlation is expected because the interaction between distant sites imposes independence of rates in the linear sequence. Our model is based on a hidden Markov model, which accounts for autocorrelation at certain regions of the protein and rate independence at others. We study the differences between the novel model and models which assume either independence or a fixed level of dependence throughout the protein. Using a diverse set of protein data sets we show that the novel model better fits most data sets. We further analyze the potassium-channel protein family and illustrate the relationship between the dependence of rates at adjacent sites and the tertiary structure of the protein.  相似文献   

19.
Distribution patterns of postmortem damage in human mitochondrial DNA   总被引:12,自引:0,他引:12  
The distribution of postmortem damage in mitochondrial DNA retrieved from 37 ancient human DNA samples was analyzed by cloning and was compared with a selection of published animal data. A relative rate of damage (rho(v)) was calculated for nucleotide positions within the human hypervariable region 1 (HVR1) and cytochrome oxidase subunit III genes. A comparison of damaged sites within and between the regions reveals that damage hotspots exist and that, in the HVR1, these correlate with sites known to have high in vivo mutation rates. Conversely, HVR1 subregions with known structural function, such as MT5, have lower in vivo mutation rates and lower postmortem-damage rates. The postmortem data also identify a possible functional subregion of the HVR1, termed "low-diversity 1," through the lack of sequence damage. The amount of postmortem damage observed in mitochondrial coding regions was significantly lower than in the HVR1, and, although hotspots were noted, these did not correlate with codon position. Finally, a simple method for the identification of incorrect archaeological haplogroup designations is introduced, on the basis of the observed spectrum of postmortem damage.  相似文献   

20.
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号