首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A number of statistical tests have been proposed to detect positive Darwinian selection affecting a few amino acid sites in a protein, exemplified by an excess of nonsynonymous nucleotide substitutions. These tests are often more powerful than pairwise sequence comparison, which averages synonymous (d(S)) and nonsynonymous (d(N)) rates over the whole gene. In a recent study, however, Hughes AL and Friedman R (2005. Variation in the pattern of synonymous and nonsynonymous difference between two fungal genomes. Mol Bio Evol. 22: 1320-1324) argue that d(S) and d(N) are expected to fluctuate along the sequence by chance and that an excess of nonsynonymous differences in individual codons is no evidence for positive selection. The authors compared codons in protein-coding genes from the genomes of 2 yeast species, Saccharomyces cerevisiae and Saccharomyces paradoxus. They calculated the proportions of synonymous and nonsynonymous differences per site (p(S) and p(N)) in every codon and discovered that p(N) is often greater than p(S) and that among some codons p(S) and p(N) are negatively correlated. The authors argued that these results invalidate previous tests of codons under positive selection. Here I discuss several errors of statistics in the analysis of Hughes and Friedman, including confusion of statistics with parameters, arbitrary data filtering, and derivation of hypotheses from data. I also apply likelihood ratio tests of positive selection to the yeast data and illustrate empirically that Hughes and Friedman's criticisms on such tests are not valid.  相似文献   

2.
The proportion of synonymous nucleotide differences per synonymous site (p(S)) and the proportion of nonsynonymous differences per nonsynonymous site (p(N)) were computed at 1,993,217 individual codons in 4,133 protein-coding genes between the two yeast species Saccharomyces cerevisiae and Saccharomyces paradoxus. When the modified Nei-Gojobori method was used, significantly more codons with p(N) > p(S) were observed than expected, based on random pairing of observed p(S) and p(N) values. However, this finding was most likely explained by the presence of a strong negative correlation between the number of synonymous differences and the number of nonsynonymous differences at codons with at least one difference. As a result of this correlation, codons with p(N) > p(S) were characterized not only by unusually high p(N) but also by unusually low p(S). On the other hand, the number of codons with p(N)>p(S) (where p(S) is the mean p(S) for all codons) was very similar to the random expectation, and the observed number of 30-codon windows with p(N) > p(S) was significantly lower than the random expectation. These results imply that the occurrence of a certain number of codons or codon windows with p(N) > p(S) is expected given the nature of nucleotide substitution and need not imply the action of positive Darwinian selection.  相似文献   

3.
Natural selection operating at the amino acid sequence level can be detected by comparing the rates of synonymous (r(S)) and nonsynonymous (r(N)) nucleotide substitutions, where r(N)/r(S) (omega) > 1 and omega < 1 suggest positive and negative selection, respectively. The branch-site test has been developed for detecting positive selection operating at a group of amino acid sites for a pre-specified (foreground) branch of a phylogenetic tree by taking into account the heterogeneity of omega among sites and branches. Here the performance of the branch-site test was examined by computer simulation, with special reference to the false-positive rate when the divergence of the sequences analyzed was small. The false-positive rate was found to inflate when the assumptions made on the omega values for the foreground and other (background) branches in the branch-site test were violated. In addition, under a similar condition, false-positive results were often obtained even when Bonferroni correction was conducted and the false-discovery rate was controlled in a large-scale analysis. False-positive results were also obtained even when the number of nonsynonymous substitutions for the foreground branch was smaller than the minimum value required for detecting positive selection. The existence of a codon site with a possibility of occurrence of multiple nonsynonymous substitutions for the foreground branch often caused the branch-site test to falsely identify positive selection. In the re-analysis of orthologous trios of protein-coding genes from humans, chimpanzees, and macaques, most of the genes previously identified to be positively selected for the human or chimpanzee branch by the branch-site test contained such a codon site, suggesting a possibility that a significant fraction of these genes are false-positives.  相似文献   

4.
In order to understand the impact of overlapping reading frames on natural selection by host CD8+ T lymphocytes (CD8(+)-TL), we analyzed the pattern of nucleotide substitution in simian immunodeficiency virus (SIV) genomes sampled from populations at time of death in 35 rhesus monkeys. Both the mean number of nonsynonymous nucleotide substitutions per nonsynonymous site (d(N)) and the mean number of synonymous nucleotide substitutions per synonymous site (d(S)) were elevated in overlap regions in comparison to non-overlap regions. Mean d(N) exceeded mean d(S) in CD8(+)-TL epitopes restricted by the host's class I major histocompatibility complex molecules. This pattern, which is indicative of positive Darwinian selection favoring amino acid changes in these epitopes, was seen in both overlap and non-overlap regions; but mean d(N) was particularly elevated in restricted CD8(+)-TL epitopes encoded in overlap regions. Amino acid changes from the inoculum were defined as parallel if the same amino acid change occurred at the same site independently in two or more monkeys, and a surprisingly high proportion (71.9%) of observed amino acid changes throughout the SIV genome occurred in parallel in different monkeys. The proportion of parallel changes in restricted epitopes encoded by overlapping reading frames was still higher (80%), supporting the hypothesis that the interaction of positive selection and overlapping reading frames enhances the probability of convergent or parallel amino acid change.  相似文献   

5.
The sporozoite threonine-asparagine-rich protein (STARP) of Plasmodium falciparum is an attractive target for a pre-erythrocytic stage malaria vaccine because both naturally acquired and experimentally induced anti-STARP antibodies can block sporozoite invasion of hepatocytes. To explore the extent of sequence variation, we surveyed nucleotide polymorphism across the entire gene, encompassing 2 exons and an intron, of 124 P. falciparum-infected blood samples from Thailand and 10 from 4 other endemic areas. In total 24 haplotypes were identified despite low-level nucleotide diversity at this locus. The mean number of nonsynonymous substitutions per nonsynonymous site (d(N)) significantly exceeded that of synonymous substitutions per synonymous site (d(S)), suggesting that the STARP gene has evolved under positive selection, probably from host immune pressure. The preponderance of conservative amino acid exchanges and a strongly biased T-nucleotide toward the third position of codons in repeat arrays have reflected simultaneous constraints on this molecule, probably from its respective unknown function and nucleotide composition. Sequence conservation in the STARP locus among clinical isolates from different disease endemic areas would not compromise vaccine incorporation.  相似文献   

6.
Infection with hepatitis C virus (HCV) is one of the leading causes of chronic hepatitis, liver cirrhosis and end-stage liver disease worldwide. The genetics of HCV infection in humans and the disease course of chronic hepatitis C are both remarkably variable. Although the response to interferon treatment is largely dependent on HCV genotypes, whether or not a relationship exists between HCV genome variability and clinical course of hepatitis C disease still remains unknown. To more thoroughly understand HCV genome evolution over time in association with disease course, near genome-wide HCV genomes present in 9 chronically infected participants over 83 total study years were sequenced. Overall, within HCV genomes, the number of synonymous substitutions per synonymous site (d(S)) significantly exceeded the number of non-synonymous substitutions per site (d(N)). Although both d(S) and d(N) significantly increased with duration of chronic infection, there was a highly significant decrease in d(N)/d(S) ratio in HCV genomes over time. These results indicate that purifying selection acted to conserve viral protein structure despite persistence of high level of nucleotide mutagenesis inherent to HCV replication. Based on liver biopsy fibrosis scores, HCV genomes from participants with advanced fibrosis had significantly greater d(S) values and lower d(N)/d(S) ratios compared to participants with mild liver disease. Over time, viral genomes from participants with mild disease had significantly greater annual changes in d(N), along with higher d(N)/d(S) ratios, compared to participants with advanced fibrosis. Yearly amino acid variations in the HCV p7, NS2, NS3 and NS5B genes were all significantly lower in participants with severe versus mild disease, suggesting possible pathogenic importance of protein structural conservation for these viral gene products.  相似文献   

7.
In the analysis of protein-coding nucleotide sequences, the ratio of the number of nonsynonymous substitutions to that of synonymous substitutions (d(N)/d(S)) is used as an indicator for the direction and magnitude of natural selection operating at the amino acid sequence level. The d(S) and d(N) values are estimated based on the comparison of homologous codons, which are often identified by converting (reverse-translating) aligned amino acid sequences into codon sequences. In this method, however, homologous codons may be mis-identified when frame-shifts occurred or amino acid sequences were mis-aligned, which may lead to overestimation of the d(N)/d(S) ratio. Here the effect of reverse-translating aligned amino acid sequences on the estimation of d(N)/d(S) ratio was examined through a large-scale analysis of protein-coding nucleotide sequences from vertebrate species. Apparently, 1-9% of codon sites that were identified as homologous with reverse-translation contained non-homologous codons, where the d(N)/d(S) ratio was unduly high. By correcting the d(N)/d(S) ratio for these codon sites, it was inferred that the ratio was 5-43% overestimated with reverse-translation. These results suggest that caution should be exerted in the study of natural selection using the d(N)/d(S) ratio by reverse-translating aligned amino acid sequences.  相似文献   

8.
Statistical properties of the branch-site test of positive selection   总被引:1,自引:0,他引:1  
The branch-site test is a likelihood ratio test to detect positive selection along prespecified lineages on a phylogeny that affects only a subset of codons in a protein-coding gene, with positive selection indicated by accelerated nonsynonymous substitutions (with ω = d(N)/d(S) > 1). This test may have more power than earlier methods, which average nucleotide substitution rates over sites in the protein and/or over branches on the tree. However, a few recent studies questioned the statistical basis of the test and claimed that the test generated too many false positives. In this paper, we examine the null distribution of the test and conduct a computer simulation to examine the false-positive rate and the power of the test. The results suggest that the asymptotic theory is reliable for typical data sets, and indeed in our simulations, the large-sample null distribution was reliable with as few as 20-50 codons in the alignment. We examined the impact of sequence length, the strength of positive selection, and the proportion of sites under positive selection on the power of the branch-site test. We found that the test was far more powerful in detecting episodic positive selection than branch-based tests, which average substitution rates over all codons in the gene and thus miss the signal when most codons are under strong selective constraint. Recent claims of statistical problems with the branch-site test are due to misinterpretations of simulation results. Our results, as well as previous simulation studies that have demonstrated the robustness of the test, suggest that the branch-site test may be a useful tool for detecting episodic positive selection and for generating biological hypotheses for mutation studies and functional analyses. The test is sensitive to sequence and alignment errors and caution should be exercised concerning its use when data quality is in doubt.  相似文献   

9.
ADAPTSITE: detecting natural selection at single amino acid sites.   总被引:12,自引:0,他引:12  
ADAPTSITE is a program package for detecting natural selection at single amino acid sites, using a multiple alignment of protein-coding sequences for a given phylogenetic tree. The program infers ancestral codons at all interior nodes, and computes the total numbers of synonymous (c(S)) and nonsynonymous (c(N)) substitutions as well as the average numbers of synonymous (s(S)) and nonsynonymous (s(N)) sites for each codon site. The probabilities of occurrence of synonymous and nonsynonymous substitutions are approximated by s(S) / (s(S) + s(N)) and s(N) / (s(S) + s(N)), respectively. The null hypothesis of selective neutrality is tested for each codon site, assuming a binomial distribution for the probability of obtaining c(S) and c(N). AVAILABILITY: ADAPTSITE is available free of charge at the World-Wide Web sites http://mep.bio.psu.edu/adaptivevol.html and http://www.cib.nig.ac.jp/dda/yossuzuk/welcome.html. The package includes the source code written in C, binary files for UNIX operating systems, manual, and example files.  相似文献   

10.
11.
We analyzed 22 clinical isolates of Plasmodium vivax from Thailand and 17 from Brazil to investigate the extent of sequence variation in the thrombospondin-related adhesive protein of Plasmodium vivax (PvTRAP), a homologue of P. falciparum TRAP (PfTRAP) which has been considered to be a promising vaccine candidate. In total 54 haplotypes were identified from 73 distinct gene clones. Coexistence of different PvTRAP in circulation occurred in 10 and 13 isolates from Thailand and Brazil, respectively. Forty out of 48 substituted nucleotides are non-synonymous changes. Most of the substituted residues reside in the von Willebrand factor type A-domain (region II), a sulfated glycosaminoglycan-binding domain (region III) and a proline-rich region (region IV). All nucleotide substitutions are dimorphic. Two haplotypes from Thailand contain an inserted sequence encoding aspartic acid-serine-proline in the proline-rich region. Sequence analysis has revealed that nucleotide diversity in PvTRAP is low although Brazilian isolates display a higher degree of variation than those from Thailand. Phylogenetic construction using the neighbor joining method has shown that most of the Thai and the Brazilian isolates appear to be mainly clustered into distinct groups. Significantly greater than expected values of the mean number of non-synonymous (d(n)) than synonymous (d(s)) nucleotide substitutions per site were observed in regions II and III of PvTRAP. Analysis of the published PfTRAP sequences has shown a similar finding in regions II and IV suggesting that positive selection operates on the regions. Hence, different regions in PvTRAP and PfTRAP could be under different pressures in terms of immune selection, structural and/or functional constraints.  相似文献   

12.
This is the first study to quantify genomic sequence variation of the major histocompatibility complex (MHC) in wild and ornamental guppies, Poecilia reticulata. We sequenced 196-219 bp of exon 2 MHC class IIB (DAB) in 56 wild Trinidadian guppies and 14 ornamental strain guppies. Each of two natural populations possessed high allelic richness (15-16 alleles), whereas only three or fewer DAB alleles were amplified from ornamental guppies. The disparity in allelic richness between wild and ornamental fish cannot be fully explained by fixation of alleles by inbreeding, nor by the presence of non-amplified sequences (ie null alleles). Rather, we suggest that the same allele is fixed at duplicated MHC DAB loci owing to gene conversion. Alternatively, the number of loci in the ornamental strains has contracted during >100 generations in captivity, a hypothesis consistent with the accordion model of MHC evolution. We furthermore analysed the substitution patterns by making pairwise comparisons of sequence variation at the putative peptide binding region (PBR). The rate of non-synonymous substitutions (dN) only marginally exceeded synonymous substitutions (dS) in PBR codons. Highly diverged sequences showed no evidence for diversifying selection, possibly because synonymous substitutions have accumulated since their divergence. Also, the substitution pattern of similar alleles did not show evidence for diversifying selection, plausibly because advantageous non-synonymous substitutions have not yet accumulated. Intermediately diverged sequences showed the highest relative rate of non-synonymous substitutions, with dN/dS>14 in some pairwise comparisons. Consequently, a curvilinear relationship was observed between the dN/dS ratio and the level of sequence divergence.  相似文献   

13.
Pattern recognition proteins play an important role in the innate immune response of invertebrates. Herein we report the evolutionary relationships among Gram-negative bacteria binding proteins (GNBPs) that were previously identified and characterized from a wide array of invertebrates. Our results, together with those obtained in previous studies, indicate that decapod lipopolysaccharide- and beta-1,3-glucan binding protein (LGBP/BGBP) has retained the crucial components for glucanase activity, and shares a common ancestor with GNBPs, as well as with the glucanase proteins of a wide range of invertebrates, rather than with GNBPs of some arthropods. However, experimental evidence of earlier studies suggested a lack of glucanase activity by these proteins, thus implying that during evolutionary time these proteins might have lost their glucan binding protein, but retained their glucan binding activity. The present results have also revealed that although a vast majority of the decapod LGBP/BGBP codons are constrained to purifying selection, certain codons are shown to have a higher rate of nonsynonymous substitutions per nonsynonymous site (dN) than synonymous substitutions per synonymous site (dS), indicating these codons have evolved adaptively (dN/dS>1). Although purifying selection (dN/dS<1) appears to be the major driving force in the evolution of a vast majority of LGBP/BGBP codons in decapods, the findings of several hotspots for nonsynonymous substitutions in this protein indicate host immune selection might play an important role in maintaining diversity among these ecologically diversified decapod species.  相似文献   

14.
Summary Based on the rates of synonymous substitution in 42 protein-codin gene pairs from rat and human, a correlation is shown to exist between the frequency of the nucleotides in all positions of the codon and the synonymous substitution rate. The correlation coefficients were positive for A and T and negative for C and G. This means that AT-rich genes accumulate more synonymous substitutions than GC-rich genes. Biased patterns of mutation could not account for this phenomenon. Thus, the variation in synonymous substitution rates and the resulting unequal codon usage must be the consequence of selection against A and T in synonymous positions. Most of the varition in rates of synonymous substitution can be explained by the nucleotide composition in synonymous positions. Codon-anticodon interactions, dinucleotide frequencies, and contextual factors influence neither the rates of synonymous substitution nor codon usage. Interestingly, the nucleotide in the second position of codons (always a nonsynonymous position) was found to affect the rate of synonymous substitution. This finding links the rate of nonsynonymous substitution with the synonymous rate. Consequently, highly conservative proteins are expected to be encoded by genes that evolve slowly in terms of synonymous substitutions, and are consequently highly biased in their codon usage.  相似文献   

15.
A new method is proposed for estimating the number of synonymous and nonsynonymous nucleotide substitutions between homologous genes. In this method, a nucleotide site is classified as nondegenerate, twofold degenerate, or fourfold degenerate, depending on how often nucleotide substitutions will result in amino acid replacement; nucleotide changes are classified as either transitional or transversional, and changes between codons are assumed to occur with different probabilities, which are determined by their relative frequencies among more than 3,000 changes in mammalian genes. The method is applied to a large number of mammalian genes. The rate of nonsynonymous substitution is extremely variable among genes; it ranges from 0.004 X 10(-9) (histone H4) to 2.80 X 10(-9) (interferon gamma), with a mean of 0.88 X 10(-9) substitutions per nonsynonymous site per year. The rate of synonymous substitution is also variable among genes; the highest rate is three to four times higher than the lowest one, with a mean of 4.7 X 10(-9) substitutions per synonymous site per year. The rate of nucleotide substitution is lowest at nondegenerate sites (the average being 0.94 X 10(-9), intermediate at twofold degenerate sites (2.26 X 10(-9)). and highest at fourfold degenerate sites (4.2 X 10(-9)). The implication of our results for the mechanisms of DNA evolution and that of the relative likelihood of codon interchanges in parsimonious phylogenetic reconstruction are discussed.  相似文献   

16.
Microsporidia are a group of intracellular parasites with an extremely compact genome and there is no confirmed evidence that retroelements are parasitised in these organisms. Using the dataset of 200,000 genomic shotgun reads of the silkworm pebrine Nosema bombycis, we have identified the eight complete N. bombycis long-terminal repeat retrotransposon (Nbr) elements. All of the Nbr elements are Ty3/gypsy members and have close relationships to Saccharomycetes long-terminal repeat retrotransposons identified previously, providing further evidence of their relationship to fungi. To explore the effect of retrotransposons in microsporidian genome evolution, their distribution was characterised by comparisons between two N. bombycis contigs containing the Nbr elements with the completed genome of the human parasite Encephalitozoon cuniculi, which is closely related to N. bombycis. The Nbr elements locate between or beside syntenic blocks, which are often clustered with other transposable-like sequences, indicating that they are associated with genome size variation and syntenic discontinuities. The ratios of the number of non-synonymous substitutions per non-synonymous site to the number of synonymous substitutions per synonymous site of the open reading frames among members of each of the eight Nbr families were estimated, which reveal the purifying selection acted on the N. bombycis long-terminal repeat retrotransposons. These results strongly suggest that retrotransposons play a major role in reorganization of the microsporidian genome and they might be active. The present study presents an initial characterization of some transposable elements in the N. bombycis genome and provides some insight into the evolutionary mechanism of microsporidian genomes.  相似文献   

17.
Ramaiah Arunachalam 《Genetica》2013,141(4-6):143-155
In the twenty-first century, the first pandemic novel human influenza A/H1N1virus (NIV) outbreak was reported at Mexico and USA on March and early April, 2009 respectively. The outbreak occurred among human populations due to the presence of meager or no immune response against newly emerged viruses. The success of vaccines and drugs depends on their low susceptibility to the formation of escape mutants in virus. Identification of excess, non-synonymous substitutions over synonymous ones is a main indicator of positive Darwinian selection in protein-coding genes of NIVs. The positive Darwinian selection operating on each site of proteins were inferred by computing ω, the ratio of the non-synonymous/synonymous substitutions [dN/dS (or) Ka/Ks], which was calculated by three different methods in terms of codon-based maximum likelihood, branch-site and empirical Bayesian methods under various models. Totally, nine sites from PB2, PB1, HA, M2 and NS1 are inferred as positively selected. The function for amino acid sites of NIVs proteins under positive selection are inferred by comparing the sites with experimentally determined functionally known amino acid sites. Completely 4 positively selected sites of PB1, HA and M2 are found to be involved in B-cell epitopes (BCEs). Interestingly, most of these sites are also involving in T-cell epitopes (TCEs). However, more sites under positive selection forces are involved in TCEs than those of BCEs. Amino acid sites engaged in both BCEs and TCEs should be measured as highly suitable targets, because these sites could induce the strong humoral and cellular immune responses against targets.  相似文献   

18.
Phadwal K 《Gene》2005,345(1):35-43
Phylogenetic analysis of carotenoid biosynthetic pathway genes and their evolutionary rate variations were studied among eubacterial taxa. The gene sequences for the enzymes involved in this pathway were obtained for major phylogenetic groups of eubacteria (green sulfur bacteria, green nonsulphur bacteria, Gram-positive bacteria, proteobacteria, flavobacteria, cyanobacteria) and archeabacteria. These gene datasets were distributed under five major steps of carotenoid biosynthesis in eubacteria; isoprenoid precursor biosynthesis, phytoene synthesis, dehydrogenation of phytoene, lycopene cyclization, formation of acyclic xanthophylls, formation of cyclic xanthophylls and carotenoid biosynthesis regulation. The NJ algorithm was used on protein coding DNA sequences to deduce the evolutionary relationship for the respective crt genes among different eubacterial lineages. The rate of nonsynonymous nucleotide substitutions per nonsynonymous site (d(N)) and synonymous nucleotide substitutions per synonymous site (d(S)) were calculated for different clades of the respective phylogenetic tree for specific crt genes. The phylogenetic analysis suggests that evolutionary pattern of crt genes in eubacteria is characterized by lateral gene transfer and gene duplication events. The d(N) values indicate that carotenoid biosynthetic genes are more conserved in proteobacteria than in any other eubacterial phyla. Furthermore, of the genes involved in carotenoid biosynthesis pathway, structural genes evolve slowly than the regulatory genes in eubacteria.  相似文献   

19.
The extent of amino acid differences of major histocompatibility complex molecules within species is unusually high, consistent with the finding that some pairs of alleles have persisted for more than ten million years and the view that the polymorphism has been maintained by natural selection. The disparity between synonymous and non-synonymous substitutions in the antigen recognition site, however, suggests that some non-synonymous sites have undergone a number of substitutions whereas others have little or none. To describe statistically such an overdispersed underlying process, commonly used Poisson processes are inadequate. An alternative process leads to the surprising conclusion that each non-synonymous site has accumulated as many as 2.6 substitutions, on the average, in the two lineages leading to humans and mice. The standard deviation is also very large (6.6) and the dispersion index (the ratio of the variance to the mean) is at least 17. The substitution process thus inferred qualitatively agrees with the disposition (a boomerang pattern) of substitutions between HLA-A2 and Aw68 alleles, and quantitatively agrees well with that expected where the evolution of major histocompatibility complex molecules has long been driven mostly by balancing selection.  相似文献   

20.
The pattern and extent of DNA sequence variability at the rplX locus (encoding ribosomal protein L24) has been investigated in nine strains of Bacillus subtilis. Overall, there is a very low level of nucleotide diversity, even at silent sites, which is probably due to selection among synonymous codons. By analogy with Escherichia coli, there may also be some effect of the relative proximity of rplX to the chromosomal origin of replication. The small number of nucleotide substitutions are non-randomly distributed: all of the synonymous changes are in valine codons. From the sequence differences the strains can be divided into two groups, which are not coincident with their previous classification; this observation is consistent with recombination among strains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号