首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comma-free codes constitute a class of circular codes, which has been widely studied, in particular by Golomb et al. (Biologiske Meddelelser, Kongelige Danske Videnskabernes Selskab 23:1–34, 1958a, Can J Math 10:202–209, 1958b), Michel et al. (Comput Math Appl 55:989–996, 2008a, Theor Comput Sci 401:17–26, 2008b, Inf Comput 212:55–63, 2012), Michel and Pirillo (Int J Comb 2011:659567, 2011), and Fimmel and Strüngmann (J Theor Biol 389:206–213, 2016). Based on a recent approach using graph theory to study circular codes Fimmel et al. (Philos Trans R Soc 374:20150058, 2016), a new class of circular codes, called strong comma-free codes, is identified. These codes detect a frameshift during the translation process immediately after a reading window of at most two nucleotides. We describe several combinatorial properties of strong comma-free codes: enumeration, maximality, self-complementarity and \(CF^3\)-property (comma-free property in all the three possible frames). These combinatorial results also highlight some new properties of the genetic code and its evolution. Each amino acid in the standard genetic code is coded by at least one strong comma-free code of size 1. There are 9 amino acids \(S=\{Asn,Asp,Gln,Gly,Lys,Met,Phe,Pro,Trp\}\) among 20 such that for each amino acid from S, its synonymous trinucleotide set (excluding the necessary periodic trinucleotides \(\{AAA,CCC,GGG,TTT\}\)) is a strong comma-free code. The primeval comma-free RNY code of Eigen and Schuster (Naturwissenschaften 65:341–369, 1978) is a self-complementary \(CF^3\)-code of size 16. Furthermore, it is the union of two strong comma-free codes of size 8 which are complementary to each other.  相似文献   

2.
Yanofsky C 《Cell》2007,128(5):815-818
In 1961, Crick, Barnett, Brenner, and Watts-Tobin (Crick et al., 1961) designed an elegant experimental strategy to determine the nature of the genetic code. Remarkably, they reached the correct conclusion despite the absence of technology to analyze and compare DNA and protein sequences.  相似文献   

3.
Freeland et al. (Mol. Biol. Evol. 2000 a, 17, 511--518) have recently used a transformation of the PAM 74-100 matrix to study the level of optimization reached during genetic code origin. Since the PAM matrix counts the amino acid substitutions that occurred in families of homologous proteins during molecular evolution and as this process is mediated by the genetic code structure itself, it could be that the influence of the code on this matrix is such as to make any conclusion insignificant. As will be shown in the present paper, the transformation of the PAM matrix is affected in a non-marginal way by the organization of the genetic code and, thus, renders the analysis of Freeland et al. tautologous. Although, under the hypothesis of a highly optimized genetic code, some correlations may be expected between a measurement of similarity between amino acids and the genetic code structure, no certain conclusions can be drawn for the measurement used by Freeland et al.  相似文献   

4.
A new approach is presented to give evidence for the theories of Jukes and Crick (1-3) that at a more primitive stage the genetic code consisted of doublets separated by "comma-bases" rather than true triplets and that G and C or A and U are the exclusive bases used by the primordial code. This approach makes use of the conservation of the histone IV sequence over extremely long periods of time by comparing the amino acid composition of the average vertebrate protein with the one of histone IV, a reconstructed ancestral polypeptide and various nuclear proteins, homologous or otherwise related to it. All protamines studied and the majority of histones show deviations from the average vertebrate protein which are statistically highly significant if the amino acids sufficiently coded for by the first two bases are compared. A similar result is obtained for those amino acids which are sufficiently coded for by the first two bases of the codon and have codons composed of G and C only.  相似文献   

5.
The primitive comma-free genetic code may have had 16 triplets of the form RNY, where R = purine, N = purine or pyrimidine, and Y = pyrimidine, specifying eight (present-day) amino acids. Calculations reveal that in this primitive code all transition changes (A?G, C?U) are either silent or missense i.e. result in the same or another one of these particular eight amino acids. There are no single transitions to non-RNY codons. Single transversions in the primitive codons can, individually, generate new (present-day) codons for four or eight amino acids. Present-day glutamine, tryptophan and stop (UGA, UAA, UAG) codons cannot be so derived., by single transversions, from any of the eight primitive codons. The modern initiation codons, AUG and GUG, can however be generated by both C → G and U → G single transversions in primitive codons. Overall, a total of 32 modern sense codons, not represented in the primitive RNY code, can be derived from this code by single transversions. Many modern codons, including all those not generated by single transversions in the primitive code, can also be produced by either of the two types of frameshift possible in runs of U- or C-rich primitive codons. Present-day stop codons are generated by +1 (-2) type frameshifts in U-rich primitive runs; AUG and GUG initiation codons are produced by the other type, +2 (-1), frameshifts in U-rich runs.  相似文献   

6.
A review of the most significant contributions on the early phases of genetic code origin is presented. After stressing the importance of the key intermediary role played in protein synthesis, by peptidyl-tRNA, which is attributed with a primary function in ancestral catalysis, the general lines leading to the codification of the first amino acids in the genetic code are discussed. This is achieved by means of a model of protoribosome evolution which sees protoribosome as the central organiser of ancestral biosynthesis and the mediator of the encounter between compounds (metabolite-pre-tRNAs) and catalysts (peptidyl-pre-tRNAs). The encounter between peptidyl-pre-tRNA catalysts in protoribosome is favoured by metabolic pre-mRNAs and later resulted (given the high temperature at which this evolution is supposed to have taken place) in the evolution of mRNAs with codons of the type GNS. These mRNAs codified only for those amino acids that the coevolution theory of genetic code origin sees as the precursors of all other amino acids. Some aspects of the model here discussed might be rendered real by the transfer-messenger RNA molecule (tmRNA) which is here considered a molecular fossil of ancestral protein synthesis.  相似文献   

7.
The fidelity of DNA synthesis catalyzed by the 180-kDa catalytic subunit (p180) of DNA polymerase alpha from Saccharomyces cerevisiae has been determined. Despite the presence of a 3'----5' exonuclease activity (Brooke et al., 1991, J. Biol. Chem., 266, 3005-3015), its accuracy is similar to several exonuclease-deficient DNA polymerases and much lower than other DNA polymerases that have associated exonucleolytic proofreading activity. Average error rates are 1/9900 and 1/12,000, respectively, for single base-substitution and minus-one nucleotide frameshift errors; the polymerase generates deletions as well. Similar error rates are observed with reactions containing the 180-kDa subunit plus an 86-kDa subunit (p86), or with these two polypeptides plus two additional subunits (p58 and p49) comprising the DNA primase activity required for DNA replication. Finally, addition of yeast replication factor-A (RF-A), a protein preparation that stimulates DNA synthesis and has single-stranded DNA-binding activity, yields a polymerization reaction with 7 polypeptides required for replication, yet fidelity remains low relative to error rates for semiconservative replication. The data suggest that neither exonucleolytic proofreading activity, the beta subunit, the DNA primase subunits nor RF-A contributes substantially to base substitution or frameshift error discrimination by the DNA polymerase alpha catalytic subunit.  相似文献   

8.
A widespread consensus holds that protein synthesis according to a genetic code was launched entirely by sophisticated RNA molecules that played both coding and functional roles. This belief persists, unsupported by phylogenetic evidence for ancestral ribozymes that catalyzed either amino acid activation or tRNA aminoacylation. By contrast, we have adduced strong experimental evidence that the most highly conserved portions of contemporary aminoacyl-tRNA synthetases (aaRS) accelerate both reactions well in excess of rates achieved by RNA aptomers derived from combinatorial libraries and of rates required for primordial protein synthesis. Such ancestral enzymes, or “Urzymes”, characterized for Class I (TrpRS (Pham et al., 2010, 2007) and LeuRS (Collier et al., 2013); 130 residues) and Class II (HisRS; 120–140 residues; (Li et al., 2011)) synthetases generally have promiscuous amino acid specificities, whereas ATP and cognate tRNA affinities are within an order of magnitude of those for contemporary enzymes. These characteristics match or exceed expectations for the primordial catalysts necessary to launch protein synthesis. Structural hierarchies in Class I and II aaRS also exhibit plateaus of increasing enzymatic activity, suggesting that catalysis by peptides similar to the Aleph motif identified by Trifonov (Sobolevsky et al.) may have been both necessary and sufficient to launch protein synthesis. Sense/antisense alignments of TrpRS and HisRS Urzyme coding sequences reveal unexpectedly high middle-base complementarity that increases in reconstructed ancestral nodes (Chandrasekaran et al.), consistent with the proposal of Rodin and Ohno (Rodin & Ohno, 1995). Thus, these ancestors were likely coded by opposite strands of the same gene, favoring simultaneous expression of aaRS activating both hydrophobic (core) and hydrophilic (surface) amino acids. Our results support the view that aaRS coevolved with cognate tRNAs from a much earlier stage than that envisioned under the RNA World hypothesis, and that their descendants make up appreciable portions of the proteome.  相似文献   

9.
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.  相似文献   

10.
Li SX  Vaccaro JA  Sweasy JB 《Biochemistry》1999,38(15):4800-4808
DNA polymerase beta is a small monomeric polymerase that participates in base excision repair and meiosis [Sobol, R., et al. (1996) Nature 379, 183-186; Plug, A., et al. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 1327-1331]. A DNA polymerase beta mutator mutant, F272L, was identified by an in vivo genetic screen [Washington, S., et al. (1997) Proc. Natl. Acad. Sci. U.S.A. 94, 1321-1326]. Residue 272 is located within the deoxynucleoside triphosphate (dNTP) binding pocket of DNA polymerase beta according to the known DNA polymerase beta crystal structures [Pelletier, H., et al. (1994) Science 264, 1891-1893; Sawaya, M., et al. (1997) Biochemistry 36, 11205-11215]. The F272L mutant produces errors at a frequency 10-fold higher than that of wild type in vivo and in the in vitro HSV-tk gap-filling assay. F272L shows an increase in the frequency of both base substitution mutations and frameshift mutations. Single-enzyme turnover studies of misincorporation by wild type and F272L DNA polymerase beta demonstrate that there is a 4-fold decrease in fidelity of the mutant as compared to that of the wild type enzyme for a G:A mismatch. The decreased fidelity is due primarily to decreased discrimination between the correct and incorrect dNTP during ground-state binding. These results suggest that the phenylalanine 272 residue is critical for maintaining fidelity during the binding of the dNTP.  相似文献   

11.
During translation, some +1 frameshift mRNA sites are decoded by frameshift suppressor tRNAs that contain an extra base in their anticodon loops. Similarly engineered tRNAs have been used to insert nonnatural amino acids into proteins. Here, we report crystal structures of two anticodon stem-loops (ASLs) from tRNAs known to facilitate +1 frameshifting bound to the 30S ribosomal subunit with their cognate mRNAs. ASL(CCCG) and ASL(ACCC) (5'-3' nomenclature) form unpredicted anticodon-codon interactions where the anticodon base 34 at the wobble position contacts either the fourth codon base or the third and fourth codon bases. In addition, we report the structure of ASL(ACGA) bound to the 30S ribosomal subunit with its cognate mRNA. The tRNA containing this ASL was previously shown to be unable to facilitate +1 frameshifting in competition with normal tRNAs (Hohsaka et al. 2001), and interestingly, it displays a normal anticodon-codon interaction. These structures show that the expanded anticodon loop of +1 frameshift promoting tRNAs are flexible enough to adopt conformations that allow three bases of the anticodon to span four bases of the mRNA. Therefore it appears that normal triplet pairing is not an absolute constraint of the decoding center.  相似文献   

12.
Summary The leaky expression of the yeast mitochondrial geneoxi1, containing a frameshift mutation (+1), is caused by natural frameshift suppression, as shown previously (Fox and Weiss-Brummer 1980). A drastic decrease in the natural level of frameshifting is found in the presence of thepar r-454 mutation, localized at the 3′ end of the 15 S rRNA gene. This mutation causes resistance to the antibiotic paronomycin in the yeast strains D273-10B and KL14-4A (Li et al. 1982; Tabak et al. 1982). The results of this study imply that in the yeast strain 777-3A this mutation alone is sufficient for restriction of the level of natural frameshifting but is insufficient to confer resistance to paromomycin. A second mutation, arising spontaneously with a frequency of 10−4 leads, in combination with thepar r-454 mutation, to full paromomycin resistance in strain 777-3A.  相似文献   

13.
Periaxin mutations cause recessive Dejerine-Sottas neuropathy   总被引:6,自引:0,他引:6       下载免费PDF全文
The periaxin gene (PRX) encodes two PDZ-domain proteins, L- and S-periaxin, that are required for maintenance of peripheral nerve myelin. Prx(-/-) mice develop a severe demyelinating peripheral neuropathy, despite apparently normal initial formation of myelin sheaths. We hypothesized that mutations in PRX could cause human peripheral myelinopathies. In accordance with this, we identified three unrelated Dejerine-Sottas neuropathy patients with recessive PRX mutations-two with compound heterozygous nonsense and frameshift mutations, and one with a homozygous frameshift mutation. We mapped PRX to 19q13.13-13.2, a region recently associated with a severe autosomal recessive demyelinating neuropathy in a Lebanese family (Delague et al. 2000) and syntenic to the location of Prx on murine chromosome 7 (Gillespie et al. 1997).  相似文献   

14.
We address the question, related with the origin of the genetic code, of why are there three bases per codon in the translation to protein process. As a follow-up to our previous work (Aldana et al., 1998, Martínez-Mekler et al., 1999a,b), we approach this problem by considering the translocation properties of primitive molecular machines, which capture basic features of ribosomal/messenger RNA interactions, while operating under prebiotic conditions. Our model consists of a short one-dimensional chain of charged particles (rRNA antecedent) interacting with a polymer (mRNA antecedent) via electrostatic forces. The chain is subject to external forcing that causes it to move along the polymer which is fixed in a quasi-one-dimensional geometry. Our numerical and analytic studies of statistical properties of random chain/polymer potentials suggest that, under very general conditions, a dynamics is attained in which the chain moves along the polymer in steps of three monomers. By adjusting the model in order to consider present-day genetic sequences, we show that the above property is enhanced for coding regions. Intergenic sequences display a behavior closer to the random situation. We argue that this dynamical property could be one of the underlying causes for the three-base codon structure of the genetic code  相似文献   

15.
In a previous study, the forward mutation spectrum induced by the chemical carcinogen N-acetoxy-N-2-acetylaminofluorene was determined (Koffel-Schwartz et al. 1984). It was found that 90% of the induced mutations are frameshift mutations located within specific sequences (mutation hot spots). Two classes of mutation hot spots were found: (i) -1 frameshift mutations occurring within runs of guanines (i.e. GGGG----GGG; (ii) -2 frameshift mutations occurring within the NarI recognition sequence (GGCGCC----GGCC). In the present work, we further investigate the genetic requirements of these frameshift events by using specific reversion assays. Like UV-induced mutagenesis, frameshift mutations occurring within runs of G's (also referred to as the "slippage pathway") require the activated form of the RecA protein (RecA*). On the other hand, frameshift mutations occurring at the NarI site (the "NarI mutation pathway") require a LexA-controlled function(s) that is not UmuDC. The LexA-controlled gene(s) that is (are) involved in this pathway remain to be identified. Moreover, this pathway does not require RecA* for the proteolytic processing of a protein other than LexA (like the cleavage of UmuD in UV-induced mutagenesis). An "additional" role of RecA can be defined as follows: (i) The non-activated form of the RecA protein acts as an inhibitor in the NarI mutation pathway. (ii) This inhibition is relieved upon activation of RecA by UV irradiation of the bacteria. (iii) A recA deletion mutant is totally proficient in the NarI mutation pathway provided the SOS system is derepressed [lexA (Def) allele]. Therefore, RecA does not actively participate in the fixation of the mutation. A molecular model for this "additional" role of RecA is proposed.  相似文献   

16.
17.
Alterations to the standard genetic code have been found in both prokaryotes and eukaryotes. This finding demolished the central dogma of molecular biology, postulated by Crick in 1968, of an immutable and universal genetic code and raised the question of how organisms survive genetic code alterations? Recent studies suggest that genetic code alterations are driven by selection using a mechanism that requires translational ambiguity. In C. albicans, the leucine CUG codon is decoded as serine through structural alterations of the translational machinery, in particular, of a Ser-tRNACAG which has dual identity and novel decoding properties. Here, we review the molecular mechanism of CUG reassignment focusing on the structural change of the translational machinery and on the impact that such alteration had on the evolution of the Candida albicans genome.  相似文献   

18.
Bashford JD  Jarvis PD 《Bio Systems》2000,57(3):147-161
The systematics of indices of physico-chemical properties of codons and amino acids across the genetic code are examined. Using a simple numerical labelling scheme for nucleic acid bases, A=(-1,0), C=(0,-1), G=(0,1), U=(1,0), data can be fitted as low order polynomials of the six coordinates in the 64-dimensional codon weight space. The work confirms and extends the recent studies by Siemion et al. (1995. BioSystems 36, 231-238) of the conformational parameters. Fundamental patterns in the data such as codon periodicities, and related harmonics and reflection symmetries, are here associated with the structure of the set of basis monomials chosen for fitting. Results are plotted using the Siemion one-step mutation ring scheme, and variants thereof. The connections between the present work, and recent studies of the genetic code structure using dynamical symmetry algebras, are pointed out.  相似文献   

19.
Traditional methods for analyzing population structure, such as the Structure program, ignore the influence of the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. Studying these effects can also reveal additional information about population evolution such as the divergence time and migration history of admixed populations. We propose mStruct, an admixture of population-specific mixtures of inheritance models that addresses the task of structure inference and mutation estimation jointly through a hierarchical Bayesian framework, and a variational algorithm for inference. We validated our method on synthetic data and used it to analyze the Human Genome Diversity Project–Centre d''Etude du Polymorphisme Humain (HGDP–CEPH) cell line panel of microsatellites and HGDP single-nucleotide polymorphism (SNP) data. A comparison of the structural maps of world populations estimated by mStruct and Structure is presented, and we also report potentially interesting mutation patterns in world populations estimated by mStruct.THE deluge of genomic polymorphism data, such as the genomewide multilocus genotype profiles of variable numbers of tandem repeats (i.e., microsatellites) and single-nucleotide polymorphisms (SNPs), has fueled the long-standing interest in analyzing patterns of genetic variations to reconstruct the ancestral structures of modern human populations. Genetic ancestral information can shed light on the evolutionary history and migrations of modern populations (Bowcock et al. 1994; Rosenberg et al. 2002; Conrad et al. 2006). It also provides guidelines for more accurate association studies (Roeder et al. 1998) and is useful for many other population genetics problems (Queller et al. 1993; Hammer et al. 1998; Templeton 2002).Various methods have been proposed for stratifying population structures on the basis of multilocus genotype information from a set of individuals. For example, Pritchard et al. (2000) proposed a model-based approach implemented in the program Structure, which uses a statistical methodology known as the allele-frequency admixture model to stratify population structures. This model, and admixture models in general arising in genetic and other contexts (Blei et al. 2003), belongs to a more general class of hierarchical Bayesian models known as the mixed membership models (Erosheva et al. 2004). Such a model postulates that an empirical multiple-instance sample, such as the ensemble of genetic markers of an individual, is made up of either independently and identically distributed (iid) instantiations (Pritchard et al. 2000) or spatially coupled (Falush et al. 2003) instantiations, from multiple population-specific fixed-dimensional multinomial distributions of marker alleles [known as allele-frequency profiles, AP (Falush et al. 2003)]. Under this assumption, the admixture model identifies each ancestral population by a specific AP (that defines a unique vector of allele frequencies of each marker in each ancestral population) and displays the fraction of contributions from each AP in a modern individual genome as an admixing vector (also known as an ancestral proportion vector or structure vector) in a structural map over the population sample in question. Figure 1 shows an example of a structural map of four modern populations inferred from a portion of the HapMap multipopulation data set by Structure. In this population structural map, the admixing vector underlying each individual is represented as a thin vertical line of unit length and multiple colors, with the height of each color reflecting the fraction of the individual''s genome originated from a certain ancestral population denoted by that color and formally represented by a unique AP. This method has been applied to the Human Genome Diversity Project–Centre d''Etude du Polymorphisme Humain (HGDP–CEPH) Human Genome Diversity Cell Line Panel in Rosenberg et al. (2002) and many other studies, and has unraveled interesting patterns in the genetic structures of the world population. However, even though Structure was originally built on a genetic admixture model, in reality the structural patterns derived by Structure in various studies often turn out to be distinct clusters among the study populations (e.g., Figure 1), which has led many to think of it as a clustering program rather than a tool for uncovering genetic admixing as it was supposed to do. The design limitation of the Structure model behind this issue motivated us to develop a new approach in this article to analyze admixed genetic samples.Open in a separate windowFigure 1.—Population structural map inferred by Structure on HapMap data consisting of four populations.A recent extension of Structure, known as Structurama (Pella and Masuda 2006; Huelsenbeck and Andolfatto 2007), relaxes the finite dimensional assumption on ancestral populations in the admixture model by employing a Dirichlet process prior over the ancestral allele-frequency profiles. This allows automatic estimation of the maximum a posteriori probable number of ancestral populations. This extension is a useful improvement since it eliminates the need for manual selection of the number of ancestral populations. Anderson and Thompson (2002) address the problem of classifying species hybrids into categories, using a model-based Bayesian clustering approach implemented in the NewHybrid program. While this problem is not exactly identical to the problem of stratifying the structure of highly admixed populations, it is useful for structural analysis of populations that were recently admixed. The BAPS program (Corander et al. 2003) also uses a Bayesian approach to find the best partition of a set of individuals into subpopulations on the basis of genotypes. Parallel to the aforementioned model-based approaches for genomic structural analysis, direct algebraic eigen-decomposition and dimensionality reduction methods, such as the Eigensoft program (Patterson et al. 2006) based on principal components analysis (PCA), offer an alternative approach to explore and visualize the ancestral composition of modern populations and facilitate formal statistical tests for significance of population differentiation. However, unlike the model-based methods such as Structure, where each inferred ancestral population bears a concrete genetic meaning as a population-specific allele-frequency profile, the eigenvectors computed by Eigensoft represent the mutually orthogonal directions in an abstract low-dimensional ancestral space, in which population samples can be embedded and visualized; these eigenvectors can be understood as mathematical surrogates of independent genetic sources underlying a population sample, but lack a concrete interpretation under a generative genetic inheritance model (from here on, we use the term “inheritance model” to describe the process by which a descendant allele is derived from an ancestral allele). Analyses based on Eigensoft are usually limited to two-dimensional ancestral spaces, offering limited power in stratifying highly admixed populations.This progress notwithstanding, an important aspect of population admixing that is largely missing in the existing methods is the effect of allele mutations between the ancestral and current alleles of genetic markers, which can dramatically influence the accuracy of the structural estimation of current populations. It can also reveal additional information about population evolution, such as the relative divergence time and migration history of admixed populations.Consider, for example, the Structure model. Since an AP merely represents the frequency of alleles in an ancestral population rather than the actual allelic content or haplotypes of the alleles themselves, the admixture models developed so far on the basis of APs do not model genetic changes due to mutations from the ancestral alleles. Indeed, a serious pitfall of the model underlying Structure, as pointed out in Excoffier and Hamilton (2003), is that there is no mutation model for modern individual alleles with respect to hypothetical common prototypes in the ancestral populations. That means every unique allele in the modern population is assumed to have a distinct ancestral proportion, rather than allowing the possibility of it just being a descendant of some common ancestral allele that can also give rise to other closely related alleles at the same locus of other individuals in the modern population. Thus, while Structure aims to provide ancestry information for each individual and each locus, there is no explicit representation of the “ancestors” as a physical set of “founding alleles.” Therefore, the inferred population structural map emphasizes revealing the contributions of abstract population-specific ancestral proportion profiles, which does not necessarily reflect individual diversity or the extent of genetic changes with respect to the founders. Due to this limitation, Structure does not enable inference of the founding genetic patterns, the age of the founding alleles, or the population divergence time (Excoffier and Hamilton 2003).The lack of an appropriate allele mutation model in a structural inference program can also compromise our ability to reliably assess the amount or level of genetic admixing in different populations. The Structure model, like several other related models (Blei et al. 2003), is based on the fundamental assumption of the presence of genetic admixing among multiple founding populations. However, as we shall see later, on real population data such as the HGDP–CEPH panel, it produces results that favor clustering individuals into predominantly one allele-frequency profile or another, thus leading us to conclude that there was little or no admixing between the ancestral human populations. We believe that this occurs due to the absence of a mutation model in Structure. While a partitioning of individuals would be desirable for clustering them into groups, it does not offer enough biological insight into the intermixing of the populations.In this article, we present mStruct (which stands for Structure under mutations), based on a new model: an admixture of population-specific mixtures of inheritance models (AdMim). Statistically, AdMim is an admixture of mixture models, which represents each ancestral population as a mixture of ancestral alleles each with its own inheritance process and each modern individual as an “ancestry vector” (or structure vector) that reflects membership proportions of the ancestral populations. As we explain shortly, mStruct facilitates estimation of both the structural map of populations and the mutation parameters of either SNP or microsatellite alleles under various contexts. A new variational inference algorithm, which is much faster than the MCMC algorithm used for Structure, was developed for estimating the structure vectors and other genetic parameters of interest. We compare our method with Structure on simulated genotype data and on the microsatellite and SNP genotype data of world populations (Rosenberg et al. 2002; Conrad et al. 2006). Our results using microsatellite data reveal the presence of significant levels of genetic admixing among the founding populations underlying the HGDP–CEPH cell line panel, as well as consequences of expansion of humans out of Africa. Our results suggest that the inability of Structure to model mutations during genetic admixing could have caused it to detect correct clustering but very low levels of genetic admixing in each modern population in the HGDP–CEPH data. We also report interesting visualizations of genetic divergence in world populations revealed by the mutation patterns estimated by mStruct. The mStruct software has been implemented in C++ and is available for download at http://www.sailing.cs.cmu.edu/mstruct.html.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号