首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.

Background  

Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Γ) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation.  相似文献   

2.
Internal protein dynamics is essential for biological function. During evolution, protein divergence is functionally constrained: properties more relevant for function vary more slowly than less important properties. Thus, if protein dynamics is relevant for function, it should be evolutionary conserved. In contrast with the well-studied evolution of protein structure, the evolutionary divergence of protein dynamics has not been addressed systematically before, apart from a few case studies. X-Ray diffraction analysis gives information not only on protein structure but also on B-factors, which characterize the flexibility that results from protein dynamics. Here we study the evolutionary divergence of protein backbone dynamics by comparing the Cα flexibility (B-factor) profiles for a large dataset of homologous proteins classified into families and superfamilies. We show that Cα flexibility profiles diverge slowly, so that they are conserved at family and superfamily levels, even for pairs of proteins with nonsignificant sequence similarity. We also analyze and discuss the correlations among the divergences of flexibility, sequence, and structure. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. David Pollock]  相似文献   

3.
The retrieval of Neanderthal (Homo neanderthalsensis) mitochondrial DNA is thought to be among the most significant ancient DNA contributions to date, allowing conflicting hypotheses on modern human (Homo sapiens) evolution to be tested directly. Recently, however, both the authenticity of the Neanderthal sequences and their phylogenetic position outside contemporary human diversity have been questioned. Using Bayesian inference and the largest dataset to date, we find strong support for a monophyletic Neanderthal clade outside the diversity of contemporary humans, in agreement with the expectations of the Out-of-Africa replacement model of modern human origin. From average pairwise sequence differences, we obtain support for claims that the first published Neanderthal sequence may include errors due to postmortem damage in the template molecules for PCR. In contrast, we find that recent results implying that the Neanderthal sequences are products of PCR artifacts are not well supported, suffering from inadequate experimental design and a presumably high percentage (>68%) of chimeric sequences due to “jumping PCR” events. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

4.
To help develop an understanding of the genes that govern the developmental characteristics of the potato (Solanum tuberosum), as well as the genes associated with responses to specified pathogens and storage conditions, The Canadian Potato Genome Project (CPGP) carried out 5′ end sequencing of regular, normalized and full-length cDNA libraries of the Shepody potato cultivar, generating over 66,600 expressed sequence tags (ESTs). Libraries sequenced represented tuber developmental stages, pathogen-challenged tubers, as well as leaf, floral developmental stages, suspension cultured cells and roots. All libraries analysed to date have contributed unique sequences, with the normalized libraries high on the list. In addition, a low molecular weight library has enhanced the 3′ ends of our sequence assemblies. Using the combined assembly dataset, unique tuber developmental, cold storage and pathogen-challenged sequences have been identified. A comparison of the ESTs specific to the pathogen-challenged tuber and foliar libraries revealed minimal overlap between these libraries. Mixed assemblies using over 189,000 potato EST sequences from CPGP and The Institute for Genomics Research (TIGR) has revealed common sequences, as well as CPGP- and TIGR-unique sequences. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users.  相似文献   

5.
A series of metallopeptides based on the amino terminal copper/nickel (ATCUN) binding motif have been evaluated as classical inhibitors and catalytic inactivators of both rabbit and human angiotensin-converting enzyme (hACE), and human endothelin-converting enzyme 1 (hECE-1). The cobalt complex [KGHK–Co(NH3)2]2+, where KGHK is lysylglycylhistidyllysine, displayed similar K I and IC50 values to those found for [KGHK–Cu]+, in spite of the enhanced charge, and so either the influence of charge is offset by the steric influence of the axially coordinated ammine ligands, or binding is dominated by contributions from the amino acid side chains, especially the C-terminal lysine that mimics the binding pattern observed for lisinopril. Moreover, the inhibition observed for [KGHK–Co(NH3)2]2+ contrasts with the activation of hACE by Co2+(aq), reflecting the stimulation of enzyme activity following replacement of the catalytic zinc cofactor by cobalt ion at each of the two active sites. Quantitative analysis of the dose-dependent stimulation of activity by Co2+(aq) yielded apparent affinities of 1.3 ± 0.2 and 56 ± 8 μM for the two sites in the presence of saturating Zn2+ (10 μM). Catalytic inactivation of hACE by [KGHK–Cu] + at subsaturating concentrations had previously been characterized, with k obs = 2.9 ± 0.5 × 10−2 min−1. Under similar conditions, the same complex is found to catalytically inactivate hECE-1, with k obs = 2.12 ± 0.16 × 10−2 min−1, demonstrating the potential for dual-action activity against two key drug targets in cardiovascular disease. Irreversible inactivation of a drug target represents a novel mechanism of drug action that complements existing classical inhibitor strategies that underlie current drug discovery efforts.Electronic Supplementary Material Supplementary material is available to authorized users in the online version of this article at .  相似文献   

6.
Genes related to sex and reproduction are known to evolve rapidly, however, the mechanism for rapid evolutionary change is proving to be more complex than a simple relaxation of selective constraint. We compared the divergence between orthologous human and mouse fertility genes according to their degree of dispensability as suggested by mouse knockout mutation phenotypes. The dataset consisted of 161 orthologous genes affecting fertility and 803 orthologous genes affecting viability. We find that essential fertility genes affecting both sexes evolve at a similar rate as essential viability genes, but that within sexes the degree of dispensability is not an important factor affecting the rate of fertility gene evolution. We also find no difference in the evolutionary rates of fertility genes that affect the male versus the female, however, there are a greater number of sterility genes that affect the male. Generally there are a significantly greater number of fertility genes that affect one sex rather than both, suggesting that fertility genes tend toward sex-specific functions, particularly in the male. Our findings support the hypothesis that the rapid evolution of sex- and reproduction-related genes is facilitated through an increased specialization of gene function and that dispensability is not a major factor determining their evolutionary rate. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Willie J. Swanson]  相似文献   

7.
Chadha P  Das RH 《Planta》2006,225(1):213-222
A pathogenesis related protein (AhPR10) is identified from a clone of 6-day old Arachis hypogaea L. (peanut) cDNA library. The clone expressed as a ∼20 kDa protein in E. coli. Nucleotide sequence derived amino acid sequence of the coding region shows its homology with PR10 proteins having Betv1 domain and P loop motif. Recombinant AhPR10 has ribonuclease activity, and antifungal activity against the peanut pathogens Fusarium oxysporum and Rhizoctonia solani. Mutant protein AhPR10-K54N where lys54 is mutated to asn54 loses its ribonuclease and antifungal activities. FITC labeled AhPR10 and AhPR10-K54N are internalized by hyphae of F. oxysporum and R. solani but the later protein does not inhibit the fungal growth. This suggests that the ribonuclease function of AhPR10 is essential for its antifungal activity. Energy and temperature dependent internalization of AhPR10 into sensitive fungal hyphae indicate that internalization of the protein occurs through active uptake.Electronic Supplementary Material Supplementary material is available to authorised users in the online version of this article at .The nucleotide sequence of AhPR10 reported in this paper is submitted to NCBI Nucleotide Sequence Database under the Accession number AY726607.  相似文献   

8.
Chromosomal deoxyribonucleic acid was isolated and purified from 10 strains ofFlavobacterium breve, originating from human or other animal sources. The mean and standard deviation for the species in base content was 32.4±0.6% G+C, and in genome size was 3.21±0.37×109 daltons. In vitro DNA reassociation showed that sevenF. breve strains (mainly from human sources) had high levels of intraspecific base sequence similarity (>70%) as derived from reassociations done at the optimum temperature of reassociation (TOR) or TOR—10°C (nonstringent conditions). The three otherF. breve strains contained a high degree of base sequence divergence. All 10 strains ofF. breve were readily distinguishable in their DNA characteristics fromF. meningosepticum, F. odoratum, and allied Gram-negative bacteria.  相似文献   

9.
Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users.  相似文献   

10.
The DNA-binding behavior and target sequences of two designed metallopeptides have been investigated with an iterative electrophoresis mobility shift assay followed by PCR amplification, and by circular dichroism spectroscopy. Peptides P3W and P5b were designed based on the structural similarity of the helix–turn–helix motif of homeodomains and the EF-hand motifs of calmodulin, as previously described for P3W. Like P3W, P5b binds both Eu(III) (K d=12.6±1.9 μM) and Ca(II) (K d=70±8 μM) with reasonable affinity. Binding selection from a library of randomized 8-mer DNA oligonucleotide sequences identified one target family for CaP5b [5′-pur-T-pur-G-(G/C)-3′], and two target sites for CaP3W [5′-(A/T)-G-G-G-(T/C)-3′ and 5′-A-T-(G/T)-T-G-3′]. Circular dichroism studies indicate that unlike EuP3W, EuP5b is poorly folded in the absence of DNA. In the presence of DNA containing target-binding sites for both peptides, both EuP3W and EuP5b increase in helical content, in the latter case significantly. These results suggest that EuP5b binding to target DNA involves an induced-fit mechanism. These small chimeric metallopeptides have been found to bind selectively to DNA targets, analogous to natural protein–DNA interactions. This corroborates our earlier conclusions (J. Am. Chem. Soc. 125:6656, 2003) that sequence-preferential DNA cleavage by Ce(IV)P3W was due to sequence recognition. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

11.
Miyazawa S 《PloS one》2011,6(3):e17244

Background

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices.

Results

Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins.

Conclusions/Significance

The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.  相似文献   

12.
The stress chaperone protein Hsp70 (DnaK) (abbreviated DnaK) and its co-chaperones Hsp40(DnaJ) (or DnaJ) and GrpE are universal in bacteria and eukaryotes but occur only in some archaea clustered in the order 5′-grpE-dnaK-dnaJ-3′ in a locus termed Locus I. Three structural varieties of Locus I, termed Types I, II, and III, were identified, respectively, in Methanosarcinales, in Thermoplasmatales and Methanothermobacter thermoautotrophicus, and in Halobacteriales. These Locus I types corresponded to three groups identified by phylogenetic trees of archaeal DnaK proteins including the same archaeal subdivisions. These archaeal DnaK groups were not significantly interrelated, clustering instead with DnaKs from three bacterial lineages, Methanosarcinales with Firmicutes, Thermoplasmatales and M. thermoautotrophicus with Thermotoga, and Halobacteriales with Actinobacteria, suggesting that the three archaeal types of Locus I were acquired by independent events of lateral gene transfer. These associations, however, lacked strong bootstrap support and were sensitive to dataset choice and tree-reconstruction method. Structural features of dnaK loci in bacteria revealed that Methanosarcinales and Firmicutes shared a similar structure, also common to most other bacterial groups. Structural differences were observed instead in Thermotoga compared to Thermoplasmatales and M. thermoautotrophicus, and in Actinobacteria compared to Halobacteriales. It was also found that the association between the DnaK sequences from Halobacteriales and Actinobacteria likely reflects common biases in their amino acid compositions. Although the loci structural features and the DnaK trees suggested the possibility of lateral gene transfer between Firmicutes and Methanosarcinales, the similarity between the archaeal and the ancestral bacterial loci favors the more parsimonious hypothesis that all archaeal sequences originated from a unique prokaryotic ancestor. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Stephen Freeland]  相似文献   

13.
Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33–45 direct and inverted repeats ≥30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

14.
15.
Gray-Mitsumune M  Matton DP 《Planta》2006,223(3):618-625
The maize ZmEA1 protein was recently postulated to be involved in short-range pollen tube guidance from the embryo sac. To date, EA1-like sequences had only been identified in monocot species. Using a more conserved C-terminal motif found in the monocot species, numerous ZmEA1-like sequences were retrieved in EST databases from dicot species, as well as from unannotated genomic sequences of Arabidopsis thaliana. RT-PCR analyses were produced for these unannotated genes and showed that these were indeed expressed genes. Further structural and phylogenetic analyses revealed that all members of the EA1-like (EAL) gene family shared a conserved 27–29 amino acid motif, termed the EA box near the C-terminal end, and appear to be secretory proteins. Therefore, the EA box proteins defines a new class of small secretory proteins, some of which being possibly involved in pollen tube guidance. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

16.
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.  相似文献   

17.
Bacterial lipoproteins are a diverse and functionally important group of proteins that are amenable to bioinformatic analyses because of their unique signal peptide features. Here we have used a dataset of sequences of experimentally verified lipoproteins of Gram-positive bacteria to refine our previously described lipoprotein recognition pattern (G+LPP). Sequenced bacterial genomes can be screened for putative lipoproteins using the G+LPP pattern. The sequences identified can then be validated using online tools for lipoprotein sequence identification. We have used our protein sequence datasets to evaluate six online tools for efficacy of lipoprotein sequence identification. Our analyses demonstrate that LipoP () performs best individually but that a consensus approach, incorporating outputs from predictors of general signal peptide properties, is most informative. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
Variation in the internal transcribed spacer (ITS) of the rRNA (rrn) operon is increasingly used to infer population-level diversity in bacterial communities. However, intragenomic ITS variation may skew diversity estimates that do not correct for multiple rrn operons within a genome. This study characterizes variation in ITS length, tRNA composition, and intragenomic nucleotide divergence across 155 Bacteria genomes. On average, these genomes encode 4.8 rrn operons (range: 2–15) and contain 2.4 unique ITS length variants (range: 1–12) and 2.8 unique sequence variants (range: 1–12). ITS variation stems primarily from differences in tRNA gene composition, with ITS regions containing tRNA-Ala + tRNA-Ile (48% of sequences), tRNA-Ala or tRNA-Ile (10%), tRNA-Glu (11%), other tRNAs (3%), or no tRNA genes (27%). Intragenomic divergence among paralogous ITS sequences grouped by tRNA composition ranges from 0% to 12.11% (mean: 0.94%). Low divergence values indicate extensive homogenization among ITS copies. In 78% of alignments, divergence is <1%, with 54% showing zero variation and 81% containing at least two identical sequences. ITS homogenization occurs over relatively long sequence tracts, frequently spanning the entire ITS, and is largely independent of the distance (basepairs) between operons. This study underscores the potential contribution of interoperon ITS variation to bacterial microdiversity studies, as well as unequivocally demonstrates the pervasiveness of concerted evolution in the rrn gene family. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Reviewing Editor: Dr. Margaret Riley  相似文献   

19.
Germline mutation rates have been found to be higher in males than in females in many organisms, a likely consequence of cell division being more frequent in spermatogenesis than in oogenesis. If the majority of mutations are due to DNA replication error, the male-to-female mutation rate ratio (αm) is expected to be similar to the ratio of the number of germ line cell divisions in males and females (c), an assumption that can be tested with proper estimates of αm and c. αm is usually estimated by comparing substitution rates in putatively neutral sequences on the sex chromosomes. However, substantial regional variation in substitution rates across chromosomes may bias estimates of αm based on the substitution rates of short sequences. To investigate regional substitution rate variation, we estimated sequence divergence in 16 gametologous introns located on the Z and W chromosomes of five bird species of the order Galliformes. Intron ends and potentially conserved blocks were excluded to reduce the effect of using sequences subject to negative selection. We found significant substitution rate variation within Z chromosome (G15 = 37.6, p = 0.0010) as well as within W chromosome introns (G15 = 44.0, p = 0.0001). This heterogeneity also affected the estimates of αm, which varied significantly, from 1.53 to 3.51, among the introns (ANOVA: F13,14 =2.68, p = 0.04). Our results suggest the importance of using extensive data sets from several genomic regions to avoid the effects of regional mutation rate variation and to ensure accurate estimates of αm. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Mr. Martin Kreitman] Nick G.C. Smith Deceased  相似文献   

20.
Hahn Y  Lee B 《Human genetics》2006,119(1-2):169-178
The comparative study of the human and chimpanzee genomes may shed light on the genetic ingredients for the evolution of the unique traits of humans. Here, we present a simple procedure to identify human-specific nonsense mutations that might have arisen since the human–chimpanzee divergence. The procedure involves collecting orthologous sequences in which a stop codon of the human sequence is aligned to a non-stop codon in the chimpanzee sequence and verifying that the latter is ancestral by finding homologs in other species without a stop codon. Using this procedure, we identify nine genes (CML2, FLJ14640, MT1L, NPPA, PDE3B, SERPINA13, TAP2, UIP1, and ZNF277) that would produce human-specific truncated proteins resulting in a loss or modification of the function. The premature terminations of CML2, MT1L, and SERPINA13 genes appear to abolish the original function of the encoded protein because the mutation removes a major part of the known active site in each case. The other six mutated genes are either known or presumed to produce functionally modified proteins. The mutations of five genes (CML2, FLJ14640, MT1L, NPPA, TAP2) are known or predicted to be polymorphic in humans. In these cases, the stop codon alleles are more prevalent than the ancestral allele, suggesting that the mutant alleles are approaching fixation since their emergence during the human evolution. The findings support the notion that functional modification or inactivation of genes by nonsense mutation is a part of the process of adaptive evolution and acquisition of species-specific features. Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号