首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans – higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.  相似文献   

2.
In this study, we use the random principle to analyse the distributions of amino acids and amino acid pairs in human tumour necrosis factor precursor (TNF-!) and its eight mutations, to compare the measured distribution probability with the theoretical distribution probability and to rank the measured distribution probability against the theoretical distribution probability. In this way, we can suggest that distributions with a high random rank should not be deliberately evolved and conserved and those with a low random rank should be deliberately evolved and conserved in human TNF-!. An increased distribution probability in a mutation means probabilistically that the mutation is more likely to occur spontaneously, whereas a decreased distribution probability in a mutation means probabilistically that the mutation is less likely to occur spontaneously and perhaps is more related to a certain cause. The results, for example, show that the distributions of 30% of the amino acids are identical with their probabilistic simplest distributions, and the distributions of some of the remaining amino acids are very close to their probabilistic simplest distributions. With respect to probabilities of distributions of amino acids in mutations, the results show that mutations lead to an increase in eight probabilities, which are thus more likely to occur. Eight probabilities decrease and are thus less likely to occur. With respect to the random ranks against the theoretical probabilities of distributions of amino acids, the results show that mutations lead to an increase in seven and a decrease in seven probabilities, with two probabilities unchanged.  相似文献   

3.
This is the continuation of our studies using random approaches to analyse the p53 protein family. In this data-based theoretical analysis, we use the random approach to analyse the amino acid pairs in human p53 protein in order to determine which amino acid pairs are more sensitive to 190 human p53 mutations/variants. The rationale of this study is based on our hypothesis and findings that a harmful mutation is more likely to occur at randomly unpredictable amino acid pairs, and a harmless mutation is more likely to occur at randomly predictable amino acid pairs. This is because we argue that the randomly predictable amino acid pairs should not be deliberately evolved, whereas the randomly unpredictable amino acid pairs should be deliberately evolved with a connection to protein function. The results show, for example, that 93.16% of 190 mutations/variants occur at randomly unpredictable amino acid pairs. Thus, the randomly unpredictable amino acid pairs are more sensitive to mutations/variants in human p53 protein. The results also suggest that the human p53 protein has a tendency for the occurrence of mutation/variants.  相似文献   

4.
Deleterious mutations associated with human diseases are predominantly found in conserved positions and positions that are essential for the structure and/or function of proteins. However, these mutations are purged from the human population over time and prevented from being fixed. Contrary to this belief, here I show that high proportions of deleterious amino acid changing mutations are fixed at positions critical for the structure and/or function of proteins. Similarly, a high rate of fixation of deleterious mutations was observed in slow-evolving amino acid positions of human proteins. The fraction of deleterious substitutions was found to be two times higher in relatively conserved amino acid positions than in highly variable positions. This study also found fixation of a much higher proportion of radical amino acid changes in primates compared with rodents and artiodactyls in slow-evolving positions. Previous studies observed a higher proportion of nonsynonymous substitutions in humans compared with other mammals, which was taken as indirect evidence for the fixation of deleterious mutations in humans. However, the results of this investigation provide direct evidence for this prediction by suggesting that the excess nonsynonymous mutations fixed in humans are indeed deleterious in nature. Furthermore, these results suggest that studies on disease-associated mutations should consider that a significant fraction of such deleterious mutations has already been fixed in the human genome, and thus, the effects of new mutations at those amino acid positions may not necessarily be deleterious and might even result in reversion to benign phenotypes.  相似文献   

5.
We report the molecular characterization of two novel galactosemia mutations that exhibit different molecular phenotypes. Both are of the missense type with low or no residual enzyme activity. The R148W mutation results in an unstable protein, although messenger RNA is still produced. In contrast, the L195P mutation produces stable but inactive immunoreactive protein. The R148W mutation alters an amino acid that is not evolutionarily conserved, while the L195P mutation affects a well-conserved residue nine amino acids down-stream from the putative active site nucleophile. These mutations provide evidence that different mechanisms can result in galactosemia: destabilizing mutations in any given area of the protein and missense mutations in conserved domains of the enzyme resulting in low or no activity. These two mutant alleles represent the fifth and sixth galactosemia mutations and confirm the hypothesis that galactosemia results from a multiplicity of mutations at the molecular level.  相似文献   

6.
More than a hundred naturally occurring mutations of human glucose-6-phosphate dehydrogenase (G6PD) have been identified at the amino acid level. The abundance of distinct mutation sites and their clinical manifestations make this enzyme ideal for structure-function analysis studies. We present here a sequence and structure combined analysis by which the severity of clinical symptoms resulting from point mutations of this enzyme is correlated with quantified degrees of amino acid conservation within 23 G6PD sequences from different organisms. Our analysis verifies, on a quantitative basis, a widely held notion that clinically severer mutations of G6PD usually occur at conserved amino acids. However, marked exceptions to this general trend exist which are most notably revealed by a number of mutations associated with chronic nonspherocytic hemolytic anemia (class I variants). When mapped onto a homology-derived structural model of human G6PD, these class I mutational sites of low amino acid conservation appear to localize in two spatially distinct clusters, both of which are populated with mutations consisting mainly of clinically severer variants (i.e. class I and class II). These results of computer-assisted analyses contribute to a further understanding of the structure-function relationships of human G6PD deficiency.  相似文献   

7.
The human mitochondrial genome, although small in size, shows a high level of variation that differs across nucleotide groups. In this work, mutation rates in mtDNA were compared in species of the Homo genus, including humans, Neanderthals, Denisova hominins, and other primate species. It was found that more than half (56.5%) of the polymorphisms in protein-coding genes of human mtDNA are actually reverse mutations to the pre-H. sapiens state of the mitochondrial genome. Among hypervariable nucleotide positions, only a small portion of mutations are specific to H. sapiens, while the majority of mutations (both nucleotide and amino acid substitutions) result in a loss of Homo-specific variants of polymorphisms. Most commonly, polymorphism variants specific to H. sapiens arise as a result of unique forward mutations and disappear mainly due to multiple reverse mutations, including those in mutational hot spots.  相似文献   

8.
The site-specific recombinases Flp and R from Saccharomyces cerevisiae and Zygosaccharomyces rouxii, respectively, are related proteins that share approximately 30% amino acid matches. They exhibit a common reaction mechanism that appears to be conserved within the larger Integrase family of site-specific recombinases. Two regions of the proteins, designated as Box I and Box II, harbor, in addition to amino acid conservation, a significantly high degree of nucleotide sequence homology within their coding segments. Box II also contains two amino acids, a histidine and an arginine, that are invariant throughout the Int family. We have performed functional analysis of Flp and R variants carrying point mutations within the Box II segment. Several positions within Box II can tolerate substitutions with no effect, or only modest effects on recombination. Alterations of the Int family residues, His305 and Arg308, in the R protein lead to the arrest of recombination at the strand cleavage or the strand exchange step. This is very similar to previously observed "step-arrest" phenotypes in Flp variants altered at these positions and has strong implications for the catalytic mechanism of recombination. Flp and R variants at His305 and His309 can be complemented in half-site strand transfer by a corresponding Tyr343 to phenylalanine variant. In contrast to Arg308 Flp variants, which are efficiently complemented in half-site strand transfer by Flp(Y343F), no strong complementation has been observed between Arg308 variants of R and R (Y343F).  相似文献   

9.
荠菜LEAFY同源基因的克隆与进化分析   总被引:4,自引:0,他引:4  
LEAFY同源基因是高等植物花的分生组织分化的重要调节基因。根据已发表的LEAFY同源基因序列保守区设计引物,以荠菜(Capsellabursa-pastoris(L.)Medic.)基因组DNA序列为模板,克隆了一条长2866bp的LEAFY同源基因。序列分析表明,该基因含有3个外显子和2个内含子,外显子编码424个氨基酸组成的多肽。其单个外显子核苷酸序列与拟南芥(Arabidopsisthaliana)LEAFY基因同源性在90%以上,氨基酸序列同源性为86%,而与琴叶拟南芥(Ara-bidopsislyrata)的氨基酸序列同源性高达90%。不同植物物种的LEAFY同源氨基酸序列在C端高度保守,而N端则有较大程度的变异。3个外显子进化速率不同可能是由于所受选择压力不同所致。存在于荠菜CapLFY基因346位上的精氨酸突变可能是造成荠菜两种生态型花期不同的原因。  相似文献   

10.
A novel complex mutation with the presence of both deletion and insertion in very close proximity in the same region was detected in exon 8 of the LDL receptor gene from two apparently unrelated Japanese families with familial hypercholesterolemia (FH). In this mutant LDL receptor gene, the nine bases from nucleotide (nt) 1115 to nt 1123 (AGGGTGGCT) were replaced by six different bases (CACTGA), and consequently the four amino acids from codon 351 to 354, Glu-Gly-Gly-Tyr, were replaced by three amino acids, Ala-Leu-Asn, in the conserved amino acid region of the growth factor repeat B of the LDL receptor. The nature of the amino acid substitution and data on the families suggest that this mutation is very likely to affect the LDL receptor function and cause FH. The generation of this complex mutation can be explained by the simultaneous occurrence of deletion and insertion through the formation of a hairpin-loop structure mediated by inverted repeat sequences. Thus this mutation supports the hypothesis that inverted repeat sequences influence the stability of a given gene and promote human gene mutations.  相似文献   

11.
The cDNAs for four rabbit cytokine genes [interleukin 2 (IL-2), IL-4, IL-6 and IL-10] have been cloned from primary lymphocytes by polymerase chain reaction (PCR) methods. IL-2 and IL-10 are both highly conserved between rabbit and other species. IL-4 and IL-6 are less strongly conserved, at both nucleotide and amino acid levels, and exhibit structural differences. An extension of the coding region of rabbit IL-6 relative to all other reported IL-6 genes results from a mutation in the usual stop codon which allows translation to continue for a further 27 amino acids. Analysis of IL-6 from four other lagomorph species suggests that this mutation is specific to the European rabbit. Sequence and structural differences of IL-4 and IL-6, while presumably not altering function, may render them highly species-specific. Several alternatively spliced variants of IL-2 and IL-4 are also reported.  相似文献   

12.
We outline a method for estimating quantitatively the influence of point mutations and selection on the frequencies of codons and amino acids. We show how the mutation rate, i.e., the rate of amino acid replacement due to point mutation, can be affected by the codon usage as well as by the rates of the involved base exchanges. A comparison of the mutation rates calculated from reliable values of codon usage and base exchange probabilities with those that would be expected on the basis of chance reveals a notable suppression of replacements leading to tryptophan, glutamate, lysine, and methionine, and particularly of those leading to the termination codons. If selection constraints are neglected and only mutations are taken into account, the best agreement between expected and observed frequencies of both codons and amino acids is obtained for alpha = 1.13-1.15, where (Formula: see text). The "selection values" of codons and amino acids derived by our method show a pattern that partially deviates from others in the literature. For example, the selection pressure on methionine and cysteine turns out to be much more pronounced than expected if only the discrepancies between their observed and expected occurrences in proteins are considered. To estimate to what extent randomly occurring amino acid replacements are accepted by selection, we constructed an "acceptability matrix" from the well-established matrix of accepted point mutations. On the basis of this matrix "acceptability values" of the amino acids can be defined that correlate with their selection values. We also examine the significance of mutations and selection of amino acids with respect to their physicochemical properties and functions in proteins. The conservatism of amino acid replacements with respect to certain properties such as polarity can be brought about by the mutational process alone, whereas the conservatism with respect to other relevant properties--among them all measures of bulkiness--obviously is the result of additional selectional constraints on the evolution of protein structures.  相似文献   

13.
In the course of an electrophoretic mutation screening program of 32,000 dried blood samples from newborns, 17 genetic variants of apolipoprotein A-I (apoA-I) were found and structurally analyzed. The following defects were identified by the combined use of high performance liquid chromatography, time-of-flight secondary ion mass spectrometry, and sequence analysis: Pro3----Arg (1 x), Pro4----Arg (1 x), Asp89----Glu (1 x), Lys107----0 (4 x), Lys107----Met (2 x), Glu139----Gly (2 x), Glu147----Val (1 x), Pro165----Arg (4 x), and Glu198----Lys (1 x). The distribution of point mutations in the apoA-I gene leading to these 9 and 11 other variants of apoA-I reported previously was statistically analyzed. Substitutions are overrepresented in the 10 amino-terminal amino acids (p less than 0.001, chi 2-test) and in residues 103-177 (p less than 0.025, chi 2-test) or residues 103-198 (p less than 0.05, chi 2-test), respectively. We further noted the following. (i) Prolines were substituted by arginine or histidine residues at a frequency much higher than expected on the basis of random nucleotide substitutions (5 out of 18 "electrically non-neutral" amino acid substitutions, p less than 0.001, chi 2-test). These substitutions are the result of transversions of cytosines contained within stretches of at least 5 consecutive cytosines in the apoA-I gene. The observed hypervariability of the apoA-I amino terminus, therefore, might be caused by a hot spot for mutation formed by the 7 subsequent cytosines in codons 3, 4, and 5. (ii) CpG dinucleotides were overrepresentatively affected by C----T transitions (5 out of 18 electrically nonneutral amino acid substitution, p less than 0.001, chi 2-test). The hypervariability of the apoA-I alpha-helical domain might therefore be caused by CpG dinucleotides predominantly occurring in codons 120-208 of apoA-I (82 out of 125). (iii) Comparison of mutation sites in the human apoA-I gene with sites of nonsynonymous substitutions revealed that amino acid substitutions found in human apoA-I were predominantly localized in areas that were little conserved during mammalian evolution. These regions may therefore represent areas of less structural constraint for the function of apoA-I.  相似文献   

14.
The study of the evolution of compensatory mechanisms among amino acids is paramount to our understanding of intramolecular epistatic interactions. It has been addressed from different points of view, for example much effort has been devoted to establish the number of compensatory mutations required per deleterious mutation. However, we still do not know how the nature of the compensated mutation determines the existence of compensatory mutations. Within this context, recent studies have produced several instances of an interesting phenomenon: human disease-associated residues may sometimes appear as wild-type residues in non-human proteins. This can be explained in terms of compensatory mutations, present in the non-human protein, which would neutralize the damage caused by the disease-associated residue. Therefore, comparison between these compensated mutations and non-compensated pathological mutations provides a simple approach to understand how the nature of the compensated deleterious mutation determines the existence of compensatory mutations. To address this issue, we have obtained a large set of compensated mutations and characterised them with a series of different properties. When comparing the resulting distributions with those from pathological mutations we find that in general compensated mutations are milder than pathological mutations. More precisely, we find that the probability that a compensatory mutation will evolve is directly related (i) to the location in the protein structure and (ii) to changes in physico-chemical properties (e.g. amino acid volume or hydrophobicity) of the compensated mutation.  相似文献   

15.
Mitochondrial DNA (mtDNA) variants have been traditionally used as markers to trace ancient population migrations. Although experiments relying on model organisms and cytoplasmic hybrids, as well as disease association studies, have served to underline the functionality of certain mtDNA SNPs, only little is known of the regulatory impact of ancient mtDNA variants, especially in terms of gene expression. By analyzing RNA-seq data of 454 lymphoblast cell lines from the 1000 Genomes Project, we found that mtDNA variants defining the most common African genetic background, the L haplogroup, exhibit a distinct overall mtDNA gene expression pattern, which was independent of mtDNA copy numbers. Secondly, intra-population analysis revealed subtle, yet significant, expression differences in four tRNA genes. Strikingly, the more prominent African mtDNA gene expression pattern best correlated with the expression of nuclear DNA-encoded RNA-binding proteins, and with SNPs within the mitochondrial RNA-binding proteins PTCD1 and MRPS7. Our results thus support the concept of an ancient regulatory transition of mtDNA-encoded genes as humans left Africa to populate the rest of the world.  相似文献   

16.
Almost 90% of nephrogenic diabetes insipidus (NDI) is due to mutations in the arginine-vasopressin receptor 2 gene (AVPR2). We retrospectively examined all the published mutations/variants in AVPR2. We planned to perform a comprehensive review of all the AVPR2 mutations/variants and to test whether any amino acid change causing a missense mutation is significantly more or less common than others. We performed a Medline search and collected detailed information regarding all AVPR2 mutations and variants. We performed a frequency comparison between mutated and wild-type amino acids and codons. We predicted the mutation effect or reported it based on published in vitro studies. We also reported the ethnicity of each mutation/variant carrier. In summary, we identified 211 AVPR2 mutations which cause NDI in 326 families and 21 variants which do not cause NDI in 71 NDI families. We described 15 different types of mutations including missense, frameshift, inframe deletion, deletion, insertion, nonsense, duplication, splicing and combined mutations. The missense mutations represent the 55.83% of all the NDI published families. Arginine and tyrosine are significantly (P = 4.07E-08 and P = 3.27E-04, respectively) the AVPR2 most commonly mutated amino acids. Alanine and glutamate are significantly (P = 0.009 and P = 0.019, respectively) the least mutated AVPR2 amino acids. The spectrum of mutations varies from rare gene variants or polymorphisms not causing NDI to rare mutations causing NDI, among which arginine and tyrosine are the most common missense. The AVPR2 mutations are spread world-wide. Our study may serve as an updated review, comprehensive of all AVPR2 variants and specific gene locations. J. Cell. Physiol. 217: 605-617, 2008. (c) 2008 Wiley-Liss, Inc.  相似文献   

17.
A recent comparative genomic analysis revealed the presence of nucleotide sequences in mouse that are known to be disease-associated in humans, yet the mouse appears normal. In this article we formulate and test several hypotheses in an attempt to explain why these apparently deleterious mutations become fixed in mice. We find that except for one case, the fixations of the disease-associated mutations occurred before the separation of Mus musculus and Mus spretus at least 1 million years ago and that the fixations are not attributable to a founder effect during the recent history of mouse breeding. About 80% of the cases involve diseases that occur before reproductive age in humans and these substitutions are unlikely to have been fixed because of the inefficiency of natural selection against late-onset diseases. We conclude that the compensatory mutation hypothesis remains the most probable explanation for the majority of the fixations of disease mutations in mice.  相似文献   

18.
Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.  相似文献   

19.
A multi-domain molecular model of factor IXa was constructed by comparative methods. The quaternary structure of the protein was assembled by docking individual domains through consideration of their shape complementarity, polaric properties and the location of cross-reacting material positive/negative (CRM+/–) variants on domain surfaces. Some 217 different missense mutations in the factor IX (F9) gene were then selected for study. Using maximum likelihood analysis, missense mutations affecting highly conserved amino acid residues of factor IX were shown to be 15–20 times more likely to result in haemophilia B than those affecting non-conserved residues. However, about one quarter of this increase in likelihood of clinical observation could be attributed to the magnitude of the amino acid exchange. Missense mutations in structurally conserved residues were found to be 2.1-fold more likely to come to clinical attention than those in structurally variable residues. Missense mutations in residues whose side chains were inwardly pointing were 3.6-fold more likely to be observed than those in surface residues. These observations imply a complex hierarchy of sequence/structure conservation in the protein. The severity of the clinical phenotype correlated with both the extent of the evolutionary sequence conservation of the residue at the site of mutation and the magnitude of the amino acid exchange. Further, the substitution of residues exhibiting minimal side chain solvent accessibility was associated disproportionately with severe haemophilia compared with that of surface residues. Clusters of CRM+ mutations were observed at factor IX-specific residues on the surface of the molecule. These clusters may reflect factor IX-specific docking interactions. The likelihood that a given factor IX mutation will come to clinical attention is therefore a complex function of the sequence characteristics of the F9 gene, the nature of the amino acid substitution, its precise location and immediate environment within the protein molecule, and its resulting effects on the structure and function of the protein.This paper is dedicated to the memory of Andrew Wacey  相似文献   

20.
Long dinucleotide repeats found in exons present a substantial mutational hazard: mutations at these loci occur often and generate frameshifts. Here, we provide clear and compelling evidence that exonic dinucleotides experience strong selective constraint. In humans, only 18 exonic dinucleotides have repeat lengths greater than six, which contrasts sharply with the genome‐wide distribution of dinucleotides. We genotyped each of these dinucleotides in 200 humans from eight 1000 Genomes Project populations and found a near‐absence of polymorphism. More remarkably, divergence data demonstrate that repeat lengths have been conserved across the primate phylogeny in spite of what is likely considerable mutational pressure. Coalescent simulations show that even a very low mutation rate at these loci fails to explain the anomalous patterns of polymorphism and divergence. Our data support two related selective constraints on the evolution of exonic dinucleotides: a short‐term intolerance for any change to repeat length and a long‐term prevention of increases to repeat length. In general, our results implicate purifying selection as the force that eliminates new, deleterious mutants at exonic dinucleotides. We briefly discuss the evolution of the longest exonic dinucleotide in the human genome—a 10 x CA repeat in fibroblast growth factor receptor‐like 1 (FGFRL1)—that should possess a considerably greater mutation rate than any other exonic dinucleotide and therefore generate a large number of deleterious variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号