首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The evolution of isochores: evidence from SNP frequency distributions   总被引:4,自引:0,他引:4  
Lercher MJ  Smith NG  Eyre-Walker A  Hurst LD 《Genetics》2002,162(4):1805-1810
The large-scale systematic variation in nucleotide composition along mammalian and avian genomes has been a focus of the debate between neutralist and selectionist views of molecular evolution. Here we test whether the compositional variation is due to mutation bias using two new tests, which do not assume compositional equilibrium. In the first test we assume a standard population genetics model, but in the second we make no assumptions about the underlying population genetics. We apply the tests to single-nucleotide polymorphism data from noncoding regions of the human genome. Both models of neutral mutation bias fit the frequency distributions of SNPs segregating in low- and medium-GC-content regions of the genome adequately, although both suggest compositional nonequilibrium. However, neither model fits the frequency distribution of SNPs from the high-GC-content regions. In contrast, a simple population genetics model that incorporates selection or biased gene conversion cannot be rejected. The results suggest that mutation biases are not solely responsible for the compositional biases found in noncoding regions.  相似文献   

2.
MOTIVATION: Calculation of the information content of motifs in genomes highly biased in nucleotide composition is likely to lead to overestimates of the amount of useful information in the motif. Calculating relative information can compensate for biases, however the resulting information content is the amount seen by an observer and not by a macromolecule binding to the motif. The latter is needed to calculate the discriminatory power of the motif and to compare motifs between species. RESULTS: By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both 'Distortion' and 'Noise' from the motif and recover a more instructive biological 'signal.' A Java application, LogoPaint, was developed to remove nucleotide bias distortion and triplet frequency noise from motifs, calculate information content and present the motif as a logo. We demonstrate how this technique can 'unmask' motifs in the translation initiation regions of bacteria that are obscured by strong sequence biases. AVAILABILITY: LogoPaint is available to all users from the authors as an executable JAR file. Source code is available by arrangement.  相似文献   

3.
Most prokaryotic genomes display strand compositional asymmetries, but the reasons for these biases remain unclear. When the distribution of gene orientation is biased, as it often is, this may induce a bias in composition, as codon frequencies are not identical. We show here that this effect can be estimated and removed, and that the residual base skews are the highest at third base codon positions and lower at first and second positions. This strongly suggests that compositional asymmetries result from 1) a replication-related mutational bias that is filtered through selective pressure and/or from 2) an uneven distribution of gene orientation. In most cases, the mutational bias alters the codon usage and amino acid frequencies of the leading and the lagging strand. However, these features are not ubiquitous amongst prokaryotes, and the biological reasons for them remain to be found.  相似文献   

4.
Patterns of non-uniform usage of synonymous codons vary across genes in an organism and between species across all domains of life. This codon usage bias (CUB) is due to a combination of non-adaptive (e.g. mutation biases) and adaptive (e.g. natural selection for translation efficiency/accuracy) evolutionary forces. Most models quantify the effects of mutation bias and selection on CUB assuming uniform mutational and other non-adaptive forces across the genome. However, non-adaptive nucleotide biases can vary within a genome due to processes such as biased gene conversion (BGC), potentially obfuscating signals of selection on codon usage. Moreover, genome-wide estimates of non-adaptive nucleotide biases are lacking for non-model organisms. We combine an unsupervised learning method with a population genetics model of synonymous coding sequence evolution to assess the impact of intragenomic variation in non-adaptive nucleotide bias on quantification of natural selection on synonymous codon usage across 49 Saccharomycotina yeasts. We find that in the absence of a priori information, unsupervised learning can be used to identify genes evolving under different non-adaptive nucleotide biases. We find that the impact of intragenomic variation in non-adaptive nucleotide bias varies widely, even among closely-related species. We show that the overall strength and direction of translational selection can be underestimated by failing to account for intragenomic variation in non-adaptive nucleotide biases. Interestingly, genes falling into clusters identified by machine learning are also physically clustered across chromosomes. Our results indicate the need for more nuanced models of sequence evolution that systematically incorporate the effects of variable non-adaptive nucleotide biases on codon frequencies.  相似文献   

5.
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.  相似文献   

6.
Lercher MJ  Hurst LD 《Gene》2002,300(1-2):53-58
One of the most abiding controversies in evolutionary biology concerns the role of neutral processes in molecular evolution. A main focus of the debate has been the evolution of isochores, the strong and systematic variation of base composition in mammalian genomes. One set of hypotheses argue that regions of similar GC are owing to localised mutational biases coupled with neutral evolution. The alternatives point to either selection or biased gene conversion as mechanisms to preferentially remove A or T bases, favouring G and C instead. Using a novel method, we compare models including such fixation biases to models based on mutation bias alone, under the assumption that non-coding, non-repetitive human DNA is at compositional equilibrium. While failing to fully explain the allele frequency distributions of recent single nucleotide polymorphism data, we show that the data are best fitted if the mutation bias is assumed to be constant across the genome, while fixation bias varies with GC content. We also attempt to estimate the strength of fixation bias, which increases linearly with increasing GC. Our approximation suggests that this force exists within the necessary parameter range: it is not so weak as to be drowned by random drift, but not so strong as to lead to exclusive use of G and C alone. Together these results demonstrate that mutation bias fails to explain the evolution of isochores, and suggest that either selection or biased gene conversion are involved.  相似文献   

7.
Genome-scale phylogeny and the detection of systematic biases   总被引:17,自引:0,他引:17  
Phylogenetic inference from sequences can be misled by both sampling (stochastic) error and systematic error (nonhistorical signals where reality differs from our simplified models). A recent study of eight yeast species using 106 concatenated genes from complete genomes showed that even small internal edges of a tree received 100% bootstrap support. This effective negation of stochastic error from large data sets is important, but longer sequences exacerbate the potential for biases (systematic error) to be positively misleading. Indeed, when we analyzed the same data set using minimum evolution optimality criteria, an alternative tree received 100% bootstrap support. We identified a compositional bias as responsible for this inconsistency and showed that it is reduced effectively by coding the nucleotides as purines and pyrimidines (RY-coding), reinforcing the original tree. Thus, a comprehensive exploration of potential systematic biases is still required, even though genome-scale data sets greatly reduce sampling error.  相似文献   

8.
MOTIVATION: Most biological sequences contain compositionally biased segments in which one or more residue types are significantly overrepresented. The function and evolution of these segments are poorly understood. Usually, all types of compositionally biased segments are masked and ignored during sequence analysis. However, it has been shown for a number of proteins that biased segments that contain amino acids with similar chemical properties are involved in a variety of molecular functions and human diseases. A detailed large-scale analysis of the functional implications and evolutionary conservation of different compositionally biased segments requires a sensitive method capable of detecting user-specified types of compositional bias. RESULTS: We present BIAS, a novel sensitive method for the detection of compositionally biased segments composed of a user-specified set of residue types. BIAS uses the discrete scan statistics that provides a highly accurate correction for multiple tests to compute analytical estimates of the significance of each compositionally biased segment. The method can take into account global compositional bias when computing analytical estimates of the significance of local clusters. BIAS is benchmarked against SEG, SAPS and CAST programs. We also use BIAS to show that groups of proteins with the same biological function are significantly associated with particular types of compositionally biased segments.  相似文献   

9.
Gene conversion is the unidirectional transfer of genetic information between allelic (orthologous) or nonallelic (paralogous) DNA segments. Recently, there has been much interest in understanding how gene conversion shapes the nucleotide composition of the genomic landscape. A widely held hypothesis is that gene conversion is universally GC-biased. However, direct experimental evidence of this hypothesis is limited to a single study of meiotic crossovers in yeast. Although there have been a number of indirect studies of gene conversion, evidence of GC-biased replacements gathered from such studies can also be attributed to positive selection, which has the same evolutionary dynamics as biased gene conversion. Here, we apply a direct phylogenetic approach to examine nucleotide replacements produced by nonallelic gene conversion in Drosophila and primate genomes. We find no evidence for GC-biased gene conversion in either lineage, suggesting that previously observed GC biases may be due to positive selection rather than to biased gene conversion.  相似文献   

10.

Background

High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked.

Results

We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others.

Conclusions

The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.  相似文献   

11.
武伟  刘洪斌  张泽  鲁成 《生物信息学》2007,5(3):102-105
利用93个节肢动物线粒体基因组数据,分析了线粒体基因组的碱基组成,及对氨基酸组成的影响。研究表明:(1)节肢动物线粒体基因组GC含量较低,分布范围较窄(13.28%~39.64%)。基因组GC含量与密码子第三位置的GC含量间的相关性(r=0.9432,p<0.01)比密码子第一、二位置上的相关性强。(2)在密码子的三个不同位置上均可以观察到C<->T和A<->G相互取代的现象。(3)从NC.004529和NC.003979两个序列的对比研究中可以发现碱基组成变化会引起氨基酸组成的变化,这种变化不仅体现在不同的物种之间,而且也体现在同一基因组内部的不同基因之间,这些影响可能是相互的。表明节肢动物线粒体基因组中的碱基变化是受多种因素共同作用的结果。  相似文献   

12.
Kingsolver et al.'s review of phenotypic selection gradients from natural populations provided a glimpse of the form and strength of selection in nature and how selection on different organisms and traits varies. Because this review's underlying database could be a key tool for answering fundamental questions concerning natural selection, it has spawned discussion of potential biases inherent in the review process. Here, we explicitly test for two commonly discussed sources of bias: sampling error and publication bias. We model the relationship between variance among selection gradients and sample size that sampling error produces by subsampling large empirical data sets containing measurements of traits and fitness. We find that this relationship was not mimicked by the review data set and therefore conclude that sampling error does not bias estimations of the average strength of selection. Using graphical tests, we find evidence for bias against publishing weak estimates of selection only among very small studies (N<38). However, this evidence is counteracted by excess weak estimates in larger studies. Thus, estimates of average strength of selection from the review are less biased than is often assumed. Devising and conducting straightforward tests for different biases allows concern to be focused on the most troublesome factors.  相似文献   

13.
Species distribution models (SDMs) are often calibrated using presence‐only datasets plagued with environmental sampling bias, which leads to a decrease of model accuracy. In order to compensate for this bias, it has been suggested that background data (or pseudoabsences) should represent the area that has been sampled. However, spatially‐explicit knowledge of sampling effort is rarely available. In multi‐species studies, sampling effort has been inferred following the target‐group (TG) approach, where aggregated occurrence of TG species informs the selection of background data. However, little is known about the species‐ specific response to this type of bias correction. The present study aims at evaluating the impacts of sampling bias and bias correction on SDM performance. To this end, we designed a realistic system of sampling bias and virtual species based on 92 terrestrial mammal species occurring in the Mediterranean basin. We manipulated presence and background data selection to calibrate four SDM types. Unbiased (unbiased presence data) and biased (biased presence data) SDMs were calibrated using randomly distributed background data. We used real and TG‐estimated sampling efforts in background selection to correct for sampling bias in presence data. Overall, environmental sampling bias had a deleterious effect on SDM performance. In addition, bias correction improved model accuracy, and especially when based on spatially‐explicit knowledge of sampling effort. However, our results highlight important species‐specific variations in susceptibility to sampling bias, which were largely explained by range size: widely‐distributed species were most vulnerable to sampling bias and bias correction was even detrimental for narrow‐ranging species. Furthermore, spatial discrepancies in SDM predictions suggest that bias correction effectively replaces an underestimation bias with an overestimation bias, particularly in areas of low sampling intensity. Thus, our results call for a better estimation of sampling effort in multispecies system, and cautions the uninformed and automatic application of TG bias correction.  相似文献   

14.
15.
Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.  相似文献   

16.
Sensory trade-offs predict signal divergence in Surfperch   总被引:1,自引:0,他引:1  
Unidirectional elaboration of male trait evolution (e.g., larger, brighter males) has been predicted by receiver bias models of sexual selection and empirically tested in a number of different taxa. This study identifies a bidirectional pattern of male trait evolution and suggests that a sensory constraint is driving this divergence. In this system, the inherent trade-off in dichromatic visual detection places limits on the direction that sensory biases may take and thus provides a quantitative test of the sensory drive model. Here I show that sensory systems with trade-offs in detection abilities produce bidirectional biases and that signal design properties match these biases. I combine species-specific measurements and ancestral estimates with visual detection modeling to examine biases in sensory and signaling traits across five fish species occupying optically diverse habitats in the Californian kelp forest. Species-specific divergence in visual pigments correlates with changes in environment and produces different sensory biases--favoring luminance (brightness) detection for some species and chromatic (color) detection for others. Divergence in male signals (spectral reflectance of orange, blue, and silver color elements) is predicted by each species' sensory bias: color divergence favors chromatic detection for species with chromatically biased visual systems, whereas species with luminance sensory biases have signals favoring luminance detection. This quantitative example of coevolution of communication traits varying in a bidirectional pattern governed by the environment is the first demonstration of sensory trade-offs driving signal evolution.  相似文献   

17.

Introduction

Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates.

Results

We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB.

Conclusion

Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.  相似文献   

18.
Codon usages in different gene classes of the Escherichia coli genome   总被引:3,自引:0,他引:3  
A new measure for assessing codon bias of one group of genes with respect to a second group of genes is introduced. In this formulation, codon bias correlations for Escherichia coli genes are evaluated for level of expression, for contrasts along genes, for genes in different 200 kb (or longer) contigs around the genome, for effects of gene size, for variation over different function classes, for codon bias in relation to possible lateral transfer and for dicodon bias for some gene classes. Among the function classes, codon biases of ribosomal proteins are the most deviant from the codon frequencies of the average E. coli gene. Other classes of ‘highly expressed genes’ (e.g. amino acyl tRNA synthetases, chaperonins, modification genes essential to translation activities) show less extreme codon biases. Consistently for genes with experimentally determined expression rates in the exponential growth phase, those of highest molar abundances are more deviant from the average gene codon frequencies and are more similar in codon frequencies to the average ribosomal protein gene. Independent of gene size, the codon biases in the 5′ third of genes deviate by more than a factor of two from those in the middle and 3′ thirds. In this context, there appear to be conflicting selection pressures imposed by the constraints of ribosomal binding, or more generally the early phase of protein synthesis (about the first 50 codons) may be more biased than the complete nascent polypeptide. In partitioning the E. coli genome into 10 equal lengths, pronounced differences in codon site 3 G+C frequencies accumulate. Genes near to oriC have 5% greater codon site 3 G+C frequencies than do genes from the ter region. This difference also is observed between small (100–300 codons) and large (>800 codons) genes. This result contrasts with that for eukaryotic genomes (including human, Caenorhabditis elegans and yeast) where long genes tend to have site 3 more AT rich than short genes. Many of the above results are special for E. coli genes and do not apply to genes of most bacterial genomes. A gene is defined as alien (possibly horizontally transferred) if its codon bias relative to the average gene exceeds a high threshold and the codon bias relative to ribosomal proteins is also appropriately high. These are identified, including four clusters (operons). The bulk of these genes have no known function.  相似文献   

19.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号