首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The number of available structures of large multi-protein assemblies is quite small. Such structures provide phenomenal insights on the organization, mechanism of formation and functional properties of the assembly. Hence detailed analysis of such structures is highly rewarding. However, the common problem in such analyses is the low resolution of these structures. In the recent times a number of attempts that combine low resolution cryo-EM data with higher resolution structures determined using X-ray analysis or NMR or generated using comparative modeling have been reported. Even in such attempts the best result one arrives at is the very course idea about the assembly structure in terms of trace of the Cα atoms which are modeled with modest accuracy.

Methodology/Principal Findings

In this paper first we present an objective approach to identify potentially solvent exposed and buried residues solely from the position of Cα atoms and amino acid sequence using residue type-dependent thresholds for accessible surface areas of Cα. We extend the method further to recognize potential protein-protein interface residues.

Conclusion/ Significance

Our approach to identify buried and exposed residues solely from the positions of Cα atoms resulted in an accuracy of 84%, sensitivity of 83–89% and specificity of 67–94% while recognition of interfacial residues corresponded to an accuracy of 94%, sensitivity of 70–96% and specificity of 58–94%. Interestingly, detailed analysis of cases of mismatch between recognition of interface residues from Cα positions and all-atom models suggested that, recognition of interfacial residues using Cα atoms only correspond better with intuitive notion of what is an interfacial residue. Our method should be useful in the objective analysis of structures of protein assemblies when positions of only Cα positions are available as, for example, in the cases of integration of cryo-EM data and high resolution structures of the components of the assembly.  相似文献   

2.
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes.  相似文献   

3.
Computational biology is replete with high-dimensional (high-D) discrete prediction and inference problems, including sequence alignment, RNA structure prediction, phylogenetic inference, motif finding, prediction of pathways, and model selection problems in statistical genetics. Even though prediction and inference in these settings are uncertain, little attention has been focused on the development of global measures of uncertainty. Regardless of the procedure employed to produce a prediction, when a procedure delivers a single answer, that answer is a point estimate selected from the solution ensemble, the set of all possible solutions. For high-D discrete space, these ensembles are immense, and thus there is considerable uncertainty. We recommend the use of Bayesian credibility limits to describe this uncertainty, where a (1−α)%, 0≤α≤1, credibility limit is the minimum Hamming distance radius of a hyper-sphere containing (1−α)% of the posterior distribution. Because sequence alignment is arguably the most extensively used procedure in computational biology, we employ it here to make these general concepts more concrete. The maximum similarity estimator (i.e., the alignment that maximizes the likelihood) and the centroid estimator (i.e., the alignment that minimizes the mean Hamming distance from the posterior weighted ensemble of alignments) are used to demonstrate the application of Bayesian credibility limits to alignment estimators. Application of Bayesian credibility limits to the alignment of 20 human/rodent orthologous sequence pairs and 125 orthologous sequence pairs from six Shewanella species shows that credibility limits of the alignments of promoter sequences of these species vary widely, and that centroid alignments dependably have tighter credibility limits than traditional maximum similarity alignments.  相似文献   

4.
The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.  相似文献   

5.
Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these “bent” alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of α-helices and β-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.  相似文献   

6.
Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ=log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments.  相似文献   

7.
8.
We developed a method for structure characterization of assembly components by iterative comparative protein structure modeling and fitting into cryo-electron microscopy (cryoEM) density maps. Specifically, we calculate a comparative model of a given component by considering many alternative alignments between the target sequence and a related template structure while optimizing the fit of a model into the corresponding density map. The method relies on the previously developed Moulder protocol that iterates over alignment, model building, and model assessment. The protocol was benchmarked using 20 varied target-template pairs of known structures with less than 30% sequence identity and corresponding simulated density maps at resolutions from 5A to 25A. Relative to the models based on the best existing sequence profile alignment methods, the percentage of C(alpha) atoms that are within 5A of the corresponding C(alpha) atoms in the superposed native structure increases on average from 52% to 66%, which is half-way between the starting models and the models from the best possible alignments (82%). The test also reveals that despite the improvements in the accuracy of the fitness function, this function is still the bottleneck in reducing the remaining errors. To demonstrate the usefulness of the protocol, we applied it to the upper domain of the P8 capsid protein of rice dwarf virus that has been studied by cryoEM at 6.8A. The C(alpha) root-mean-square deviation of the model based on the remotely related template, bluetongue virus VP7, improved from 8.7A to 6.0A, while the best possible model has a C(alpha) RMSD value of 5.3A. Moreover, the resulting model fits better into the cryoEM density map than the initial template structure. The method is being implemented in our program MODELLER for protein structure modeling by satisfaction of spatial restraints and will be applicable to the rapidly increasing number of cryoEM density maps of macromolecular assemblies.  相似文献   

9.
The three-dimensional structure of Aspergillus niger pectin lyase B (PLB) has been determined by crystallographic techniques at a resolution of 1.7 Å. The model, with all 359 amino acids and 339 water molecules, refines to a final crystallographic R factor of 16.5%. The polypeptide backbone folds into a large right-handed cylinder, termed a parallel β helix. Loops of various sizes and conformations protrude from the central helix and probably confer function. The largest loop of 53 residues folds into a small domain consisting of three antiparallel β strands, one turn of an α helix, and one turn of a 310 helix. By comparison with the structure of Erwinia chrysanthemi pectate lyase C (PelC), the primary sequence alignment between the pectate and pectin lyase subfamilies has been corrected and the active site region for the pectin lyases deduced. The substrate-binding site in PLB is considerably less hydrophilic than the comparable PelC region and consists of an extensive network of highly conserved Trp and His residues. The PLB structure provides an atomic explanation for the lack of a catalytic requirement for Ca2+ in the pectin lyase family, in contrast to that found in the pectate lyase enzymes. Surprisingly, however, the PLB site analogous to the Ca2+ site in PelC is filled with a positive charge provided by a conserved Arg in the pectin lyases. The significance of the finding with regard to the enzymatic mechanism is discussed.  相似文献   

10.
We have determined the crystal structure of the RNA octamer duplex r(guguuuac)/r(guaggcac) with a tandem wobble pair, G·G/U·U (motif III), to compare it with U·G/G·U (motif I) and G·U/U·G (motif II) and to better understand their relative stabilities. The crystal belongs to the rhombohedral space group R3. The hexagonal unit cell dimensions are a = b = 41.92 Å, c = 56.41 Å, and γ = 120°, with one duplex in the asymmetric unit. The structure was solved by the molecular replacement method at 1.9 Å resolution and refined to a final R factor of 19.9% and Rfree of 23.3% for 2862 reflections in the resolution range 10.0–1.9 Å with F ≥ 2σ(F). The final model contains 335 atoms for the RNA duplex and 30 water molecules. The A-RNA stacks in the familiar head-to-tail fashion forming a pseudo-continuous helix. The uridine bases of the tandem U·G pairs have slipped towards the minor groove relative to the guanine bases and the uridine O2 atoms form bifurcated hydrogen bonds with the N1 and N2 of guanines. The N2 of guanine and O2 of uridine do not bridge the ‘locked’ water molecule in the minor groove, as in motifs I and II, but are bridged by water molecules in the major groove. A comparison of base stacking stabilities of motif III with motifs I and II confirms the result of thermodynamic studies, motif I > motif III > motif II.  相似文献   

11.
The gene for the Campylobacter ferric receptor (CfrA), a putative iron-siderophore transporter in the enteric food-borne pathogen Campylobacter jejuni, was cloned, and the membrane protein was expressed in Escherichia coli, affinity purified, and then reconstituted into model lipid membranes. Fourier transform infrared spectra recorded from the membrane-reconstituted CfrA are similar to spectra that have been recorded from other iron-siderophore transporters and are highly characteristic of a β-sheet protein (~44% β-sheet and ~10% α-helix). CfrA undergoes relatively extensive peptide hydrogen-deuterium exchange upon exposure to 2H2O and yet is resistant to thermal denaturation at temperatures up to 95°C. The secondary structure, relatively high aqueous solvent exposure, and high thermal stability are all consistent with a transmembrane β-barrel structure containing a plug domain. Sequence alignments indicate that CfrA contains many of the structural motifs conserved in other iron-siderophore transporters, including the Ton box, PGV, IRG, RP, and LIDG motifs of the plug domain. Surprisingly, a homology model reveals that regions of CfrA that are expected to play a role in enterobactin binding exhibit sequences that differ substantially from the sequences of the corresponding regions that play an essential role in binding/transport by the E. coli enterobactin transporter, FepA. The sequence variations suggest that there are differences in the mechanisms used by CfrA and FepA to interact with bacterial siderophores. It may be possible to exploit these structural differences to develop CfrA-specific therapeutics.  相似文献   

12.
We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.  相似文献   

13.
Correcting errors in shotgun sequences   总被引:4,自引:1,他引:3       下载免费PDF全文
Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.  相似文献   

14.
15.
PASS2 is a nearly automated version of CAMPASS and contains sequence alignments of proteins grouped at the level of superfamilies. This database has been created to fall in correspondence with SCOP database (1.53 release) and currently consists of 110 multi-member superfamilies and 613 superfamilies corresponding to single members. In multi-member superfamilies, protein chains with no more than 25% sequence identity have been considered for the alignment and hence the database aims to address sequence alignments which represent 26 219 protein domains under the SCOP 1.53 release. Structure-based sequence alignments have been obtained by COMPARER and the initial equivalences are provided automatically from a MALIGN alignment and subsequently augmented using STAMP4.0. The final sequence alignments have been annotated for the structural features using JOY4.0. Several interesting links are provided to other related databases and genome sequence relatives. Availability of reliable sequence alignments of distantly related proteins, despite poor sequence identity and single-member superfamilies, permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure–function relationships of individual superfamilies. The database can be queried by keywords and also by sequence search, interfaced by PSI-BLAST methods. Structure-annotated sequence alignments and several structural accessory files can be retrieved for all the superfamilies including the user-input sequence. The database can be accessed from http://www.ncbs.res.in/%7Efaculty/mini/campass/pass.html.  相似文献   

16.
The gene GAD2 encoding the glutamic acid decarboxylase enzyme (GAD65) is a positional candidate gene for obesity on Chromosome 10p11–12, a susceptibility locus for morbid obesity in four independent ethnic populations. GAD65 catalyzes the formation of γ-aminobutyric acid (GABA), which interacts with neuropeptide Y in the paraventricular nucleus to contribute to stimulate food intake. A case-control study (575 morbidly obese and 646 control subjects) analyzing GAD2 variants identified both a protective haplotype, including the most frequent alleles of single nucleotide polymorphisms (SNPs) +61450 C>A and +83897 T>A (OR = 0.81, 95% CI [0.681–0.972], p = 0.0049) and an at-risk SNP (−243 A>G) for morbid obesity (OR = 1.3, 95% CI [1.053–1.585], p = 0.014). Furthermore, familial-based analyses confirmed the association with the obesity of SNP +61450 C>A and +83897 T>A haplotype (χ2 = 7.637, p = 0.02). In the murine insulinoma cell line βTC3, the G at-risk allele of SNP −243 A>G increased six times GAD2 promoter activity (p < 0.0001) and induced a 6-fold higher affinity for nuclear extracts. The −243 A>G SNP was associated with higher hunger scores (p = 0.007) and disinhibition scores (p = 0.028), as assessed by the Stunkard Three-Factor Eating Questionnaire. As GAD2 is highly expressed in pancreatic β cells, we analyzed GAD65 antibody level as a marker of β-cell activity and of insulin secretion. In the control group, −243 A>G, +61450 C>A, and +83897 T>A SNPs were associated with lower GAD65 autoantibody levels (p values of 0.003, 0.047, and 0.006, respectively). SNP +83897 T>A was associated with lower fasting insulin and insulin secretion, as assessed by the HOMA-B% homeostasis model of β-cell function (p = 0.009 and 0.01, respectively). These data support the hypothesis of the orexigenic effect of GABA in humans and of a contribution of genes involved in GABA metabolism in the modulation of food intake and in the development of morbid obesity.  相似文献   

17.
Degradation of glucose has been implicated in acetate production in rice field soil, but the abundance of glucose, the temporal change of glucose turnover, and the relationship between glucose and acetate catabolism are not well understood. We therefore measured the pool sizes of glucose and acetate in rice field soil and investigated the turnover of [U-14C]glucose and [2-14C]acetate. Acetate accumulated up to about 2 mM during days 5 to 10 after flooding of the soil. Subsequently, methanogenesis started and the acetate concentration decreased to about 100 to 200 μM. Glucose always made up >50% of the total monosaccharides detected. Glucose concentrations decreased during the first 10 days from 90 μM initially to about 3 μM after 40 days of incubation. With the exception at day 0 when glucose consumption was slow, the glucose turnover time was in the range of minutes, while the acetate turnover time was in the range of hours. Anaerobic degradation of [U-14C]glucose released [14C]acetate and 14CO2 as the main products, with [14C]acetate being released faster than 14CO2. The products of [2-14C]acetate metabolism, on the other hand, were 14CO2 during the reduction phase of soil incubation (days 0 to 15) and 14CH4 during the methanogenic phase (after day 15). Except during the accumulation period of acetate (days 5 to 10), approximately 50 to 80% of the acetate consumed was produced from glucose catabolism. However, during the accumulation period of acetate, the rate of acetate production from glucose greatly exceeded that of acetate consumption. Under steady-state conditions, up to 67% of the CH4 was produced from acetate, of which up to 56% was produced from glucose degradation.  相似文献   

18.
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW global alignment in the form of a list of anchor points between pairs of sequences. The method is demonstrated using anchors supplied by the Blast post-processing program, Ballast. The rapidity and reliability of DbClustal have been demonstrated using the recently annotated Pyrococcus abyssi proteome where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A web site has been implemented proposing BlastP database searches with automatic alignment of the top hits by DbClustal.  相似文献   

19.
The conformation of a polypeptide or protein chain may be specified by stating the orientations of the two linked peptide residues at each alpha carbon atom in the chain, namely the two dihedral angles ϕ, ϕ′ about the single bonds N—αC and αC—C′ from a defined standard conformation. By using certain criteria of minimum contact distances between the various atoms, the allowed anges of (ϕ, ϕ′) have been worked out for three values of the angle N-αC-C′ (τ), namely 105, 110, and 115° for non-glycyl, and 110 and 115° for glycyl residues. The theory is compared with all the available crystallographic data (up to early 1965) on simple (di- and tri-) peptides, cyclic peptides, polypeptide and protein structures, and the observed data fully support the conclusions from theory. The effect of the gamma carbon atom, in its three possible positions, is also discussed, and is found to alter the outer limits of the allowed region of (ϕ, ϕ′) only slightly. The paper contains exhaustive references to the published data on these structures, using x-ray diffraction.  相似文献   

20.
Yersinia enterocolitica (Ye) evades the immune system of the host by injection of Yersinia outer proteins (Yops) via a type three secretion system into host cells. In this study, a reporter system comprising a YopE-β-lactamase hybrid protein and a fluorescent staining sensitive to β-lactamase cleavage was used to track Yop injection in cell culture and in an experimental Ye mouse infection model. Experiments with GD25, GD25-β1A, and HeLa cells demonstrated that β1-integrins and RhoGTPases play a role for Yop injection. As demonstrated by infection of splenocyte suspensions in vitro, injection of Yops appears to occur randomly into all types of leukocytes. In contrast, upon infection of mice, Yop injection was detected in 13% of F4/80+, 11% of CD11c+, 7% of CD49b+, 5% of Gr1+ cells, 2.3% of CD19+, and 2.6% of CD3+ cells. Taking the different abundance of these cell types in the spleen into account, the highest total number of Yop-injected cells represents B cells, particularly CD19+CD21+CD23+ follicular B cells, followed by neutrophils, dendritic cells, and macrophages, suggesting a distinct cellular tropism of Ye. Yop-injected B cells displayed a significantly increased expression of CD69 compared to non-Yop-injected B cells, indicating activation of these cells by Ye. Infection of IFN-γR (receptor)- and TNFRp55-deficient mice resulted in increased numbers of Yop-injected spleen cells for yet unknown reasons. The YopE-β-lactamase hybrid protein reporter system provides new insights into the modulation of host cell and immune responses by Ye Yops.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号