首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Tie trees generated by distance methods of phylogenetic reconstruction   总被引:2,自引:0,他引:2  
In examining genetic data in recent publications, Backeljau et al. showed cases in which two or more different trees (tie trees) were constructed from a single data set for the neighbor-joining (NJ) method and the unweighted pair group method with arithmetic mean (UPGMA). However, it is still unclear how often and under what conditions tie trees are generated. Therefore, I examined these problems by computer simulation. Examination of cases in which tie trees occur shows that tie trees can appear when no substitutions occur along some interior branch(es) on a tree. However, even when some substitutions occur along interior branches, tie trees can appear by chance if parallel or backward substitutions occur at some sites. The simulation results showed that tie trees occur relatively frequently for sequences with low divergence levels or with small numbers of sites. For such data, UPGMA sometimes produced tie trees quite frequently, whereas tie trees for the NJ method were generally rare. In the simulation, bootstrap values for clusters (tie clusters) that differed among tie trees were mostly low (< 60%). With a small probability, relatively high bootstrap values (at most 70%-80%) appeared for tie clusters. The bias of the bootstrap values caused by an input order of sequence can be avoided if one of the different paths in the cycles of making an NJ or UPGMA tree is chosen at random in each bootstrap replication.   相似文献   

2.
Positive selection on the H3 hemagglutinin gene of human influenza virus A.   总被引:16,自引:0,他引:16  
The hemagglutinin (HA) gene of influenza viruses encodes the major surface antigen against which neutralizing antibodies are produced during infection or vaccination. We examined temporal variation in the HA1 domain of HA genes of human influenza A (H3N2) viruses in order to identify positively selected codons. Positive selection is defined for our purposes as a significant excess of nonsilent over silent nucleotide substitutions. If past mutations at positively selected codons conferred a selective advantage on the virus, then additional changes at these positions may predict which emerging strains will predominate and cause epidemics. We previously reported that a 38% excess of mutations occurred on the tip or terminal branches of the phylogenetic tree of 254 HA genes of influenza A (H3N2) viruses. Possible explanations for this excess include processes other than viral evolution during replication in human hosts. Of particular concern are mutations that occur during adaptation of viruses for growth in embryonated chicken eggs in the laboratory. Because the present study includes 357 HA sequences (a 40% increase), we were able to separately analyze those mutations assigned to internal branches. This allowed us to determine whether mutations on terminal and internal branches exhibit different patterns of selection at the level of individual codons. Additional improvements over our previous analysis include correction for a skew in the distribution of amino acid replacements across codons and analysis of a population of phylogenetic trees rather than a single tree. The latter improvement allowed us to ascertain whether minor variation in tree structure had a significant effect on our estimate of the codons under positive selection. This method also estimates that 75.6% of the nonsilent mutations are deleterious and have been removed by selection prior to sampling. Using the larger data set and the modified methods, we confirmed a large (40%) excess of changes on the terminal branches. We also found an excess of changes on branches leading to egg-grown isolates. Furthermore, 9 of the 18 amino acid codons, identified as being under positive selection to change when we used only mutations assigned to internal branches, were not under positive selection on the terminal branches. Thus, although there is overlap between the selected codons on terminal and internal branches, the codons under positive selection on the terminal branches differ from those on the internal branches. We also observed that there is an excess of positively selected codons associated with the receptor-binding site and with the antibody-combining sites. This association may explain why the positively selected codons are restricted in their distribution along the sequence. Our results suggest that future studies of positive selection should focus on changes assigned to the internal branches, as certain of these changes may have predictive value for identifying future successful epidemic variants.  相似文献   

3.
We present an original approach to identifying sequence variants in a mixed DNA population from sequence trace data. The heart of the method is based on parsimony: given a wildtype DNA sequence, a set of observed variations at each position collected from sequencing data, and a complete catalog of all possible mutations, determine the smallest set of mutations from the catalog that could fully explain the observed variations. The algorithmic complexity of the problem is analyzed for several classes of mutations, including block substitutions, single-range deletions, and single-range insertions. The reconstruction problem is shown to be NP-complete for single-range insertions and deletions, while for block substitutions, single character insertion, and single character deletion mutations, polynomial time algorithms are provided. Once a minimum set of mutations compatible with the observed sequence is found, the relative frequency of those mutations is recovered by solving a system of linear equations. Simulation results show the algorithm successfully deconvolving mutations in p53 known to cause cancer. An extension of the algorithm is proposed as a new method of high throughput screening for single nucleotide polymorphisms by multiplexing DNA.  相似文献   

4.
L L Shu  W J Bean    R G Webster 《Journal of virology》1993,67(5):2723-2729
This study examined the evolution and variation of the human influenza virus nucleoprotein gene from the earliest isolates to the present. Phylogenetic reconstruction of the most parsimonious evolutionary path connecting 49 nucleoprotein sequences yielded a single lineage. The average calculated rate of mutation was 3.6 nucleotide substitutions per year (2.3 x 10(-3) substitutions per site per year). Thirty-two percent of these mutations resulted in amino acid substitutions, and the remainder were silent mutations. Analysis of virus isolates from China and elsewhere showed no significant differences in their rate of evolution, genetic diversity, or mean survival time. The nearly constant rate of change was maintained through the two antigenic shifts, and there were no obvious changes in the number or types of mutations associated with the changes in the surface proteins. A detailed comparison of the changes that have occurred on the main evolutionary path with those that have occurred on the side branches of the phylogenetic tree was made. This showed that while 35% of the mutations on the side branches resulted in amino acid changes, only 21% of those on the main path affected the protein sequence. These results suggest that although the rate of change of the human influenza virus nucleoprotein is much higher than that previously described for avian influenza viruses, there are measurable constraints on the evolution of the surviving virus lineage. Comparison of the nucleoproteins of virus isolates adapted to chicken embryos with the nucleoproteins of those grown only in MDCK cells revealed no consistent differences between the virus pairs. Thus, although the nucleoprotein is known to be critical for host specificity, its adaptation to growth in eggs apparently involves no immediate selective pressures, such as are found with hemagglutinin.  相似文献   

5.
Large numbers of protein expression changes are usually observed in mouse models for neurodegenerative diseases, even when only a single gene was mutated in each case. To study the effect of gene dose alterations on the cellular proteome, we carried out a proteomic investigation on murine embryonic stem cells that either overexpressed individual genes or displayed aneuploidy over a genomic region encompassing 14 genes. The number of variant proteins detected per cell line ranged between 70 and 110, and did not correlate with the number of modified genes. In cell lines with single gene mutations, up and down-regulated proteins were always in balance in comparison to parental cell lines regarding number as well as concentration of differentially expressed proteins. In contrast, dose alteration of 14 genes resulted in an unequal number of up and down-regulated proteins, though the balance was kept at the level of protein concentration. We propose that the observed protein changes might partially be explained by a proteomic network response. Hence, we hypothesize the existence of a class of "balancer" proteins within the proteomic network, defined as proteins that buffer or cushion a system, and thus oppose multiple system disturbances. Through database queries and resilience analysis of the protein interaction network, we found that potential balancer proteins are of high cellular abundance, possess a low number of direct interaction partners, and show great allelic variation. Moreover, balancer proteins contribute more heavily to the network entropy, and thus are of high importance in terms of system resilience. We propose that the "elasticity" of the proteomic regulatory network mediated by balancer proteins may compensate for changes that occur under diseased conditions.  相似文献   

6.
7.
Prediction of contact maps with neural networks and correlated mutations.   总被引:1,自引:0,他引:1  
Contact maps of proteins are predicted with neural network-based methods, using as input codings of increasing complexity including evolutionary information, sequence conservation, correlated mutations and predicted secondary structures. Neural networks are trained on a data set comprising the contact maps of 173 non-homologous proteins as computed from their well resolved three-dimensional structures. Proteins are selected from the Protein Data Bank database provided that they align with at least 15 similar sequences in the corresponding families. The predictors are trained to learn the association rules between the covalent structure of each protein and its contact map with a standard back propagation algorithm and tested on the same protein set with a cross-validation procedure. Our results indicate that the method can assign protein contacts with an average accuracy of 0.21 and with an improvement over a random predictor of a factor >6, which is higher than that previously obtained with methods only based either on neural networks or on correlated mutations. Furthermore, filtering the network outputs with a procedure based on the residue coordination numbers, the accuracy of predictions increases up to 0.25 for all the proteins, with an 8-fold deviation from a random predictor. These scores are the highest reported so far for predicting protein contact maps.  相似文献   

8.
There have been repeated observations that proteins are surprisingly robust to site mutations, enduring significant numbers of substitutions with little change in structure, stability, or function. These results are almost paradoxical in light of what is known about random heteropolymers and the sensitivity of their properties to seemingly trivial mutations. To address this discrepancy, the preservation of biological protein properties in the presence of mutation has been interpreted as indicating the independence of selective pressure on such properties. Such results also lead to the prediction that de novo protein design should be relatively easy, in contrast to what is observed. Here, we use a computational model with lattice proteins to demonstrate how this robustness can result from population dynamics during the evolutionary process. As a result, sequence plasticity may be a characteristic of evolutionarily derived proteins and not necessarily a property of designed proteins. This suggests that this robustness must be re-interpreted in evolutionary terms, and has consequences for our understanding of both in vivo and in vitro protein evolution.  相似文献   

9.
Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.  相似文献   

10.
The RecG protein of Escherichia coli is a structure-specific DNA helicase that targets strand exchange intermediates in genetic recombination and drives their branch migration along the DNA. Strains carrying null mutations in recG show reduced recombination and DNA repair. Suppressors of this phenotype, called srgA, were located close to metB and shown to be alleles of priA. Suppression depends on the RecA, RecBCD, RecF, RuvAB, and RuvC recombination proteins. Nine srgA mutations were sequenced and shown to specify mutant PriA proteins with single amino acid substitutions located in or close to one of the conserved helicase motifs. The mutant proteins retain the ability to catalyze primosome assembly, as judged by the viability of recG srgA and srgA strains and their ability to support replication of plasmids based on the ColE1 replicon. Multicopy priA+ plasmids increase substantially the recombination- and repair-deficient phenotype of recG strains and confer similar phenotypes on recG srgA double mutants but not on ruvAB or wild-type strains. The multicopy effect is eliminated by K230R, C446G, and C477G substitutions in PriA. It is concluded that the 3'-5' DNA helicase/translocase activity of PriA inhibits recombination and that this effect is normally countered by RecG.  相似文献   

11.
Base excision repair (BER) pathway executed by a complex network of proteins is the major system responsible for the removal of damaged DNA bases and repair of DNA single strand breaks (SSBs) generated by environmental agents, such as certain cancer therapies, or arising spontaneously during cellular metabolism. Both modified DNA bases and SSBs with ends other than 3'-OH and 5'-P are repaired either by replacement of a single or of more nucleotides in the processes called short-patch BER (SP-BER) or long-patch BER (LP-BER), respectively. In contrast to Escherichia coli cells, in human ones, the two BER sub-pathways are operated by different sets of proteins. In this review the selection between SP- and LP-BER and mutations in BER and end-processors genes and their contribution to bacterial mutagenesis and human diseases are considered.  相似文献   

12.
Suppressors of the methyl methanesulfonate sensitivity of Saccharomyces cerevisiae diploids lacking the Srs2 helicase turned out to contain semidominant mutations in Rad5l, a homolog of the bacterial RecA protein. The nature of these mutations was determined by direct sequencing. The 26 mutations characterized were single base substitutions leading to amino acid replacements at 18 different sites. The great majority of these sites (75%) are conserved in the family of RecA-like proteins, and 10 of them affect sites corresponding to amino acids in RecA that are probably directly involved in ATP reactions, binding, and/or hydrolysis. Six mutations are in domains thought to be involved in interaction between monomers; they may also affect ATP reactions. By themselves, all the alleles confer a rad5l null phenotype. When heterozygous, however, they are, to varying degrees, negative semidominant for radiation sensitivity; presumably the mutant proteins are coassembled with wild-type Rad51 and poison the resulting nucleofilaments or recombination complexes. This negative effect is partially suppressed by an SRS2 deletion, which supports the hypothesis that Srs2 reverses recombination structures that contain either mutated proteins or numerous DNA lesions.  相似文献   

13.
14.
Bobula J  Tomala K  Jez E  Wloch DM  Borts RH  Korona R 《Genetics》2006,174(2):937-944
The malfunctioning of molecular chaperones may result in uncovering genetic variation. The molecular basis of this phenomenon remains largely unknown. Chaperones rescue proteins unfolded by environmental stresses and therefore they might also help to stabilize mutated proteins and thus mask damages. To test this hypothesis, we carried out a genomewide mutagenesis followed by a screen for mutations that were synthetically harmful when the RAC-Ssb1/2 cytosolic chaperones were inactive. Mutants with such a phenotype were found and mapped to single nucleotide substitutions. However, neither the genes identified nor the nature of genetic lesions implied that folding of the mutated proteins was being supported by the chaperones. In a second screen, we identified temperature-sensitive (ts) mutants, a phenotype indicative of structural instability of proteins. We tested these for an association with sensitivity to loss of chaperone activity but found no such correlation as might have been expected if the chaperones assisted the folding of mutant proteins. Thus, molecular chaperones can mask the negative effects of mutations but the mechanism of such buffering need not be direct. A plausible role of chaperones is to stabilize genetic networks, thus making them more tolerant to malfunctioning of their constituents.  相似文献   

15.
The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been similar or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.  相似文献   

16.
Recent large-scale sequencing studies have revealed that cancer genomes contain variable numbers of somatic point mutations distributed across many genes. These somatic mutations most likely include passenger mutations that are not cancer causing and pathogenic driver mutations in cancer genes. Establishing a significant presence of driver mutations in such data sets is of biological interest. Whereas current techniques from phylogeny are applicable to large data sets composed of singly mutated samples, recently exemplified with a p53 mutation database, methods for smaller data sets containing individual samples with multiple mutations need to be developed. By constructing distinct models of both the mutation process and selection pressure upon the cancer samples, exact statistical tests to examine this problem are devised. Tests to examine the significance of selection toward missense, nonsense, and splice site mutations are derived, along with tests assessing variation in selection between functional domains. Maximum-likelihood methods facilitate parameter estimation, including levels of selection pressure and minimum numbers of pathogenic mutations. These methods are illustrated with 25 breast cancers screened across the coding sequences of 518 kinase genes, revealing 90 base substitutions in 71 genes. Significant selection pressure upon truncating mutations was established. Furthermore, an estimated minimum of 29.8 mutations were pathogenic.  相似文献   

17.
Tamuri AU  dos Reis M  Goldstein RA 《Genetics》2012,190(3):1101-1115
Estimation of the distribution of selection coefficients of mutations is a long-standing issue in molecular evolution. In addition to population-based methods, the distribution can be estimated from DNA sequence data by phylogenetic-based models. Previous models have generally found unimodal distributions where the probability mass is concentrated between mildly deleterious and nearly neutral mutations. Here we use a sitewise mutation-selection phylogenetic model to estimate the distribution of selection coefficients among novel and fixed mutations (substitutions) in a data set of 244 mammalian mitochondrial genomes and a set of 401 PB2 proteins from influenza. We find a bimodal distribution of selection coefficients for novel mutations in both the mitochondrial data set and for the influenza protein evolving in its natural reservoir, birds. Most of the mutations are strongly deleterious with the rest of the probability mass concentrated around mildly deleterious to neutral mutations. The distribution of the coefficients among substitutions is unimodal and symmetrical around nearly neutral substitutions for both data sets at adaptive equilibrium. About 0.5% of the nonsynonymous mutations and 14% of the nonsynonymous substitutions in the mitochondrial proteins are advantageous, with 0.5% and 24% observed for the influenza protein. Following a host shift of influenza from birds to humans, however, we find among novel mutations in PB2 a trimodal distribution with a small mode of advantageous mutations.  相似文献   

18.
Ehlers-Danlos syndrome (EDS) type IV results from mutations in the COL3A1 gene, which encodes the constituent chains of type III procollagen. We have identified, in 33 unrelated individuals or families with EDS type IV, mutations that affect splicing, of which 30 are point mutations at splice junctions and 3 are small deletions that remove splice-junction sequences and partial exon sequences. Except for one point mutation at a donor site, which leads to partial intron inclusion, and a single base-pair substitution at an acceptor site, which gives rise to inclusion of the complete upstream intron into the mature mRNA, all mutations result in deletion of a single exon as the only splice alteration. Of the exon-skipping mutations that are due to single base substitutions, which we have identified in 28 separate individuals, only two affect the splice-acceptor site. The underrepresentation of splice acceptor-site mutations suggests that the favored consequence of 3' mutations is the use of an alternative acceptor site that creates a null allele with a premature-termination codon. The phenotypes of those mutations may differ, with respect to either their severity or their symptomatic range, from the usual presentation of EDS type IV and thus have been excluded from analysis.  相似文献   

19.
There are two tightly linked loci (D and CE) for the human Rh blood group. Their gene products are membrane proteins having 12 transmembrane domains and form a complex with Rh50 glycoprotein on erythrocytes. We constructed phylogenetic networks of human and nonhuman primate Rh genes, and the network patterns suggested the occurrences of gene conversions. We therefore used a modified site-by-site reconstruction method by using two assumed gene trees and detected 9 or 11 converted regions. After eliminating the effect of gene conversions, we estimated numbers of nonsynonymous and synonymous substitutions for each branch of both trees. Whichever gene tree we selected the branch connecting hominoids and Old World monkeys showed significantly higher nonsynonymous than synonymous substitutions, an indication of positive selection. Many other branches also showed higher nonsynonymous than synonymous substitutions; this suggests that the Rh genes have experienced some kind of positive selection. Received: 16 March 1999 / Accepted: 17 June 1999  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号