首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Statistical and biochemical studies of the genetic code have found evidence of nonrandom patterns in the distribution of codon assignments. It has, for example, been shown that the code minimizes the effects of point mutation or mistranslation: erroneous codons are either synonymous or code for an amino acid with chemical properties very similar to those of the one that would have been present had the error not occurred. This work has suggested that the second base of codons is less efficient in this respect, by about three orders of magnitude, than the first and third bases. These results are based on the assumption that all forms of error at all bases are equally likely. We extend this work to investigate (1) the effect of weighting transition errors differently from transversion errors and (2) the effect of weighting each base differently, depending on reported mistranslation biases. We find that if the bias affects all codon positions equally, as might be expected were the code adapted to a mutational environment with transition/transversion bias, then any reasonable transition/transversion bias increases the relative efficiency of the second base by an order of magnitude. In addition, if we employ weightings to allow for biases in translation, then only 1 in every million random alternative codes generated is more efficient than the natural code. We thus conclude not only that the natural genetic code is extremely efficient at minimizing the effects of errors, but also that its structure reflects biases in these errors, as might be expected were the code the product of selection. Received: 25 July 1997 / Accepted: 9 January 1998  相似文献   

2.
How did the ``universal' genetic code arise? Several hypotheses have been put forward, and the code has been analyzed extensively by authors looking for clues to selection pressures that might have acted during its evolution. But this approach has been ineffective. Although an impressive number of properties has been attributed to the universal code, it has been impossible to determine whether selection on any of these properties was important in the code's evolution or whether the observed properties arose as a consequence of selection on some other characteristic. Therefore we turned the question around and asked, what would a genetic code look like if it had evolved in response to various different selection pressures? To address this question, we constructed a genetic algorithm. We found first that selecting on a particular measure yields codes that are similar to each other. Second, we found that the universal code is far from minimized with respect to the effects of mutations (or translation errors) on the amino acid compositions of proteins. Finally, we found that the codes that most closely resembled real codes were those generated by selecting on aspects of the code's structure, not those generated by selecting to minimize the effects of amino acid substitutions on proteins. This suggests that the universal genetic code has been selected for a particular structure—a structure that confers an important flexibility on the evolution of genes and proteins—and that the particular assignments of amino acids to codons are secondary. Received: 29 December 1998 / Accepted: 8 July 1999  相似文献   

3.
Early fixation of an optimal genetic code   总被引:19,自引:0,他引:19  
The evolutionary forces that produced the canonical genetic code before the last universal ancestor remain obscure. One hypothesis is that the arrangement of amino acid/codon assignments results from selection to minimize the effects of errors (e.g., mistranslation and mutation) on resulting proteins. If amino acid similarity is measured as polarity, the canonical code does indeed outperform most theoretical alternatives. However, this finding does not hold for other amino acid properties, ignores plausible restrictions on possible code structure, and does not address the naturally occurring nonstandard genetic codes. Finally, other analyses have shown that significantly better code structures are possible. Here, we show that if theoretically possible code structures are limited to reflect plausible biological constraints, and amino acid similarity is quantified using empirical data of substitution frequencies, the canonical code is at or very close to a global optimum for error minimization across plausible parameter space. This result is robust to variation in the methods and assumptions of the analysis. Although significantly better codes do exist under some assumptions, they are extremely rare and thus consistent with reports of an adaptive code: previous analyses which suggest otherwise derive from a misleading metric. However, all extant, naturally occurring, secondarily derived, nonstandard genetic codes do appear less adaptive. The arrangement of amino acid assignments to the codons of the standard genetic code appears to be a direct product of natural selection for a system that minimizes the phenotypic impact of genetic error. Potential criticisms of previous analyses appear to be without substance. That known variants of the standard genetic code appear less adaptive suggests that different evolutionary factors predominated before and after fixation of the canonical code. While the evidence for an adaptive code is clear, the process by which the code achieved this optimization requires further attention.  相似文献   

4.
MOTIVATION: We present an application of Bayesian variable selection to the novel detection of sequence elements that confer negative design to protein structure and function. As an illustration, we analyze the different dimer interfaces between the CXCL8 chemokine family with the CCL4 and CCL2 chemokine families to discover the changes that disfavor CXCL8 of quaternary structure. RESULTS: In comparison with known experimental results, our method identifies evolutionarily conserved sequence changes in the CC families that inhibit CXCL8 quaternary structure. Therefore, we find positive selection of negative design elements. Furthermore, our approach predicts that a two-residue deletion conserved in the CCL4 chemokine family disfavors CXCL8 dimerization. AVAILABILITY: The Matlab code for the Bayesian variable selection is freely available at http://stat.tamu.edu/~mvannucci/webpages/codes.html  相似文献   

5.
The present paper will focus on the relation between the structure of the table of the genetic code and the evolution of primitive organisms: it will be shown that the organization of the code table according to an optimization principle based on the notion of resistance to errors can provide a criterium for selection. The ordered aspect of the genetic code table makes this result a plausible starting point for studies of the origin and evolution of the genetic code: these could include, besides a more refined optimization principle at the logical level, some effects more directly related to the physico-chemical context, and the construction of realistic models incorporating both aspects.  相似文献   

6.
Functional information means an encoded network of functions in living organisms from molecular signaling pathways to an organism’s behavior. It is represented by two components: code and an interpretation system, which together form a self-sustaining semantic closure. Semantic closure allows some freedom between components because small variations of the code are still interpretable. The interpretation system consists of inference rules that control the correspondence between the code and the function (phenotype) and determines the shape of the fitness landscape. The utility factor operates at multiple time scales: short-term selection drives evolution towards higher survival and reproduction rate within a given fitness landscape, and long-term selection favors those fitness landscapes that support adaptability and lead to evolutionary expansion of certain lineages. Inference rules make short-term selection possible by shaping the fitness landscape and defining possible directions of evolution, but they are under control of the long-term selection of lineages. Communication normally occurs within a set of agents with compatible interpretation systems, which I call communication system. Functional information cannot be directly transferred between communication systems with incompatible inference rules. Each biological species is a genetic communication system that carries unique functional information together with inference rules that determine evolutionary directions and constraints. This view of the relation between utility and inference can resolve the conflict between realism/positivism and pragmatism. Realism overemphasizes the role of inference in evolution of human knowledge because it assumes that logic is embedded in reality. Pragmatism substitutes usefulness for truth and therefore ignores the advantage of inference. The proposed concept of evolutionary pragmatism rejects the idea that logic is embedded in reality; instead, inference rules are constructed within each communication system to represent reality, and they evolve towards higher adaptability on a long time scale.  相似文献   

7.
SUMMARY: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. AVAILABILITY: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html.  相似文献   

8.
MOTIVATION: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. RESULTS: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised the absolute value of w2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. AVAILABILITY: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/ The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments.  相似文献   

9.
The Standard Genetic Code is organized such that similar codons encode similar amino acids. One explanation suggested that the Standard Code is the result of natural selection to reduce the fitness ``load' that derives from the mutation and mistranslation of protein-coding genes. We review the arguments against the mutational load-minimizing hypothesis and argue that they need to be reassessed. We review recent analyses of the organization of the Standard Code and conclude that under cautious interpretation they support the mutational load-minimizing hypothesis. We then present a deterministic asexual model with which we study the mode of selection for load minimization. In this model, individual fitness is determined by a protein phenotype resulting from the translation of a mutable set of protein-coding genes. We show that an equilibrium fitness may be associated with a population with the same genetic code and that genetic codes that assign similar codons to similar amino acids have a higher fitness. We also show that the number of mutant codons in each individual at equilibrium, which determines the strength of selection for load minimization, reflects a long-term evolutionary balance between mutations in messages and selection on proteins, rather than the number of mutations that occur in a single generation, as has been assumed by previous authors. We thereby establish that selection for mutational load minimization acts at the level of an individual in a single generation. We conclude with comments on the shortcomings and advantages of load minimization over other hypotheses for the origin of the Standard Code. Received: 4 April 2001 / Accepted: 22 October 2001  相似文献   

10.
A progene hypothesis has been proposed earlier to explain the mechanism of origin of the self-reproducing genetic system. Progenes (precursors of the genetic system) are mixed anhydrides of an amino acid and deoxyribotrinucleotide at the 3'-gamma-terminal phosphate (NpNpNppp-AA); they are produced from dinucleotides (NpNp) and 3'-gamma-aminoacylnucleotidylates (Nppp-AA) as a result of specific interaction between amino acid and dinucleotide. The postulated mechanism of progene formation accounts for the selection of substances, including chirality, the origin of the genetic code as well as for the mechanisms of formation, self-reproduction and evolution of the simpliest genetic system ("gene--polypeptide"). A stereochemical analysis of the progene formation mechanism has allowed us to support the main statements of the hypothesis that relate to the origin of the genetic code and to selection of substances. Atomic groups that could be responsible for the specificity of interaction between dinucleotides and amino acids in progene formation have been revealed. Stereochemical evidence for the physicochemical basis of the origin of the existing genetic code have been produced: 1) a special role of the second nucleotide in the codon is demonstrated in amino acid coding by the progene hypothesis principle; 2) an advantage of T against U in such coding is demonstrated; 3) for 16 amino acids out of 20 an agreement has been obtained between the optimal dinucleotide as revealed by the stereochemical analysis and the codon dinucleotides; 4) an explanation for the third nucleotide selection mechanism is offered. A restoration of the prebiotic code, based on these results, has indicated that the code contains 32 codons, is statistical and group-wise. It encodes 7 groups of isofunctional amino acids: 3 overlapping groups of non-polar amino acids 1) medium-size hydrophobic amino acids (chiefly Val, n-Val and a-But), 2) small and medium-size non-polar amino acids (chiefly Ala Val, n-Val a-But and Gly), 3) small non-polar amino acids (Gly, Ala, a-But) and 4 groups of polar amino acids--1) hydroxy--+dicarbonic (Asp, Glu, Ser and Thr), 2) dicarbonic (Asp and Glu), 3) hydroxy (Ser and Thr) and 4) basic (Arg and Lys). The code includes about 20 amino acids among which are 15-17 canonical and a few common non-canonical. The prebiotic code explains many properties of the existing genetic code and is capable of evolving into the latter by way of a gradual replacement of the physicochemical coding mechanism by the enzymatic coding mechanism.  相似文献   

11.
DNA or RNA aptamers have gained attention as the next generation antibody-like molecules for medical or diagnostic use. Conventional secondary structure prediction tools for nucleic acids play an important role to truncate or minimize sequence, or introduce limited chemical modifications without compromising or changing its binding affinity to targets in the design of improved aptamers selected by Systematic Evolution of Ligands by EXponential enrichment (SELEX). We describe a novel software package, ValFold, capable of predicting secondary structures with improved accuracy based on unique aptamer characteristics. ValFold predicts not only the canonical Watson-Crick pairs but also G-G pairs derived from G-quadruplex (known structure for many aptamers) using the stem candidate selection algorithm. AVAILABILITY: The database is available for free at http://code.google.com/p/valfold/  相似文献   

12.
Due to its cost effectiveness, next generation sequencing of pools of individuals (Pool‐Seq) is becoming a popular strategy for genome‐wide estimation of allele frequencies in population samples. As the allele frequency spectrum provides information about past episodes of selection, Pool‐seq is also a promising design for genomic scans for selection. However, no software tool has yet been developed for selection scans based on Pool‐Seq data. We introduce Pool‐hmm, a Python program for the estimation of allele frequencies and the detection of selective sweeps in a Pool‐Seq sample. Pool‐hmm includes several options that allow a flexible analysis of Pool‐Seq data, and can be run in parallel on several processors. Source code and documentation for Pool‐hmm is freely available at https://qgsp.jouy.inra.fr/ .  相似文献   

13.
The deluge of data emerging from high-throughput sequencing technologies poses large analytical challenges when testing for association to disease. We introduce a scalable framework for variable selection, implemented in C++ and OpenCL, that fits regularized regression across multiple Graphics Processing Units. Open source code and documentation can be found at a Google Code repository under the URL http://bioinformatics.oxfordjournals.org/content/early/2012/01/10/bioinformatics.bts015.abstract. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
15.
CD45, encoded by PTPRC in humans, is the most abundantly expressed protein on the surface of many lymphocytes. We investigated whether the extracellular region of CD45 was under positive selection in Old World primates, and whether there was differential selection across this region, particularly on exons that were involved in alternative splicing and those that were not alternatively spliced. The results show extraordinarily strong and consistent positive Darwinian selection on the extracellular part of CD45 throughout the evolution of Old World monkeys, apes and humans. Positive selection is concentrated in exons 9 and 14, which code for the previously neglected linker and fibronectin III domains. These exons have a high rate of evolution at nonsynonymous sites that is roughly twice as high as that of the intronic rate in this gene. In contrast, alternatively spliced exons 4-6, which code for the variable domains, are under weaker positive selection and are evolving more slowly than the intronic rate. These data provide a striking example of positive selection in a well-known gene that should provide an impetus for further functional studies to elucidate its species-specific function.  相似文献   

16.
The Case for an Error Minimizing Standard Genetic Code   总被引:1,自引:1,他引:0  
Since discovering the pattern by which amino acids are assigned to codons within the standard genetic code, investigators have explored the idea that natural selection placed biochemically similar amino acids near to one another in coding space so as to minimize the impact of mutations and/or mistranslations. The analytical evidence to support this theory has grown in sophistication and strength over the years, and counterclaims questioning its plausibility and quantitative support have yet to transcend some significant weaknesses in their approach. These weaknesses are illustrated here by means of a simple simulation model for adaptive genetic code evolution. There remain ill explored facets of the `error minimizing' code hypothesis, however, including the mechanism and pathway by which an adaptive pattern of codon assignments emerged, the extent to which natural selection created synonym redundancy, its role in shaping the amino acid and nucleotide languages, and even the correct interpretation of the adaptive codon assignment pattern: these represent fertile areas for future research.  相似文献   

17.
I attempt to sketch a unified picture of the origin of living organisms in their genetic, bioenergetic, and structural aspects. Only selection at a higher level than for individual selfish genes could power the cooperative macromolecular coevolution required for evolving the genetic code. The protein synthesis machinery is too complex to have evolved before membranes. Therefore a symbiosis of membranes, replicators, and catalysts probably mediated the origin of the code and the transition from a nucleic acid world of independent molecular replicators to a nucleic acid/protein/lipid world of reproducing organisms. Membranes initially functioned as supramolecular structures to which different replicators attached and were selected as a higher-level reproductive unit: the proto-organism. I discuss the roles of stereochemistry, gene divergence, codon capture, and selection in the code's origin. I argue that proteins were primarily structural not enzymatic and that the first biological membranes consisted of amphipathic peptidyl-tRNAs and prebiotic mixed lipids. The peptidyl-tRNAs functioned as genetically-specified lipid analogues with hydrophobic tails (ancestral signal peptides) and hydrophilic polynucleotide heads. Protoribosomes arose from two cooperating RNAs: peptidyl transferase (large subunit) and mRNA-binder (small subunit). Early proteins had a second key role: coupling energy flow to the phosphorylation of gene and peptide precursors, probably by lithophosphorylation by membrane-anchored kinases scavenging geothermal polyphosphate stocks. These key evolutionary steps probably occurred on the outer surface of an `inside out-cell' or obcell, which evolved an unambiguous hydrophobic code with four prebiotic amino acids and proline, and initiation by isoleucine anticodon CAU; early proteins and nucleozymes were all membrane-attached. To improve replication, translation, and lithophosphorylation, hydrophilic substrate-binding and catalytic domains were later added to signal peptides, yielding a ten-acid doublet code. A primitive proto-ecology of molecular scavenging, parasitism, and predation evolved among obcells. I propose a new theory for the origin of the first cell: fusion of two cup-shaped obcells, or hemicells, to make a protocell with double envelope, internal genome and ribosomes, protocytosol, and periplasm. Only then did water-soluble enzymes, amino acid biosynthesis, and intermediary metabolism evolve in a concentrated autocatalytic internal cytosolic soup, causing 12 new amino acid assignments, termination, and rapid freezing of the 22-acid code. Anticodons were recruited sequentially: GNN, CNN, INN, and *UNN. CO2 fixation, photoreduction, and lipid synthesis probably evolved in the protocell before photophosphorylation. Signal recognition particles, chaperones, compartmented proteases, and peptidoglycan arose prior to the last common ancestor of life, a complex autotrophic, anaerobic green bacterium. Received: 19 February 2001 / Accepted: 9 April 2001  相似文献   

18.
Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.com/p/mia2009/).  相似文献   

19.
YODA: selecting signature oligonucleotides   总被引:3,自引:0,他引:3  
MOTIVATION: Selecting oligonucleotide probes for use in microarray design, and other applications requiring signature sequences, involves identifying sequences which will bind strongly to their intended target, while binding only weakly (or preferably, not at all) to non-target sequences which may be present in the hybridization reaction. While many tools to assist in selection of such sequences exist, all the ones we examined lack important oligo design and software features. RESULTS: YODA is an application for assisting biological researchers in selecting signature sequences. It incorporates a custom sequence similarity search to find potential cross-hybridizing non-target sequences. For this task, most oligo design tools rely on BLAST, which is ill suited for it due to an unacceptable risk of false negatives. YODA supports multiple probe design goals including single-genome, multiple-genome, pathogen-host and species/strain-identification. A graphical interface is provided as well as a command-line interface, both of which support many user-controlled parameters. YODA is easy to install and use and runs on Windows, Mac OS X and Linux platforms. AVAILABILITY: Freely available (LGLP) along with source code and additional documentation at http://pathport.vbi.vt.edu/YODA CONTACT: enordber@vbi.vt.edu.  相似文献   

20.
We simulate a deterministic population genetic model for the coevolution of genetic codes and protein-coding genes. We use very simple assumptions about translation, mutation, and protein fitness to calculate mutation-selection equilibria of codon frequencies and fitness in a large asexual population with a given genetic code. We then compute the fitnesses of altered genetic codes that compete to invade the population by translating its genes with higher fitness. Codes and genes coevolve in a succession of stages, alternating between genetic equilibration and code invasion, from an initial wholly ambiguous coding state to a diversified frozen coding state. Our simulations almost always resulted in partially redundant frozen genetic codes. Also, the range of simulated physicochemical properties among encoded amino acids in frozen codes was always less than maximal. These results did not require the assumption of historical constraints on the number and type of amino acids available to codes nor on the complexity of proteins, stereochemical constraints on the translational apparatus, nor mechanistic constraints on genetic code change. Both the extent and timing of amino-acid diversification in genetic codes were strongly affected by the message mutation rate and strength of missense selection. Our results suggest that various omnipresent phenomena that distribute codons over sites with different selective requirements—such as the persistence of nonsynonymous mutations at equilibrium, the positive selection of the same codon in different types of sites, and translational ambiguity—predispose the evolution of redundancy and of reduced amino acid diversity in genetic codes. Received: 21 December 2000 / Accepted: 12 March 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号