首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
P J Kraulis  T A Jones 《Proteins》1987,2(3):188-201
A method to build a three-dimensional protein model from nuclear magnetic resonance (NMR) data using fragments from a data base of crystallographically determined protein structures is presented. The interproton distances derived from the nuclear Overhauser effect (NOE) data are compared to the precalculated distances in the known protein structures. An efficient search algorithm is used, which arranges the distances in matrices akin to a C alpha diagonal distance plot, and compares the NOE distance matrices for short sequential zones of the protein to the data base matrices. After cluster analysis of the fragments found in this way, the structure is built by aligning fragments in overlapping zones. The sequentially long-range NOEs cannot be used in the initial fragments search but are vital to discriminate between several possible combinations of different groups of fragments. The method has been tested on one simulated NOE data set derived from a crystal structure and one experimental NMR data set. The method produces models that have good local structure, but may contain larger global errors. These models can be used as the starting point for further refinement, e.g., by restrained molecular dynamics or interactive graphics.  相似文献   

2.
In this paper, we address the multiple peak alignment problem in sequential data analysis with an approach based on the Gaussian scale-space theory. We assume that multiple sets of detected peaks are the observed samples of a set of common peaks. We also assume that the locations of the observed peaks follow unimodal distributions (e.g., normal distribution) with their means equal to the corresponding locations of the common peaks and variances reflecting the extension of their variations. Under these assumptions, we convert the problem of estimating locations of the unknown number of common peaks from multiple sets of detected peaks into a much simpler problem of searching for local maxima in the scale-space representation. The optimization of the scale parameter is achieved using an energy minimization approach. We compare our approach with a hierarchical clustering method using both simulated data and real mass spectrometry data. We also demonstrate the merit of extending the binary peak detection method (i.e., a candidate is considered either as a peak or as a nonpeak) with a quantitative scoring measure-based approach (i.e., we assign to each candidate a possibility of being a peak).  相似文献   

3.
Genetic variances and covariances, summarized in G matrices, are key determinants of the course of adaptive evolution. Consequently, understanding how G matrices vary among populations is critical to answering a variety of questions in evolutionary biology. A method has recently been proposed for generating null distributions of statistics pertaining to differences in G matrices among populations. The general approach facilitated by this method is likely to prove to be very important in studies of the evolution of G . We have identified an issue in the method that will cause it to create null distributions of differences in G matrices that are likely to be far too narrow. The issue arises from the fact that the method as currently used generates null distributions of statistics pertaining to differences in G matrices across populations by simulating breeding value vectors based on G matrices estimated from data, randomizing these vectors across populations, and then calculating null values of statistics from G matrices that are calculated directly from the variances and covariances among randomized vectors. This calculation treats breeding values as quantities that are directly measurable, instead of predicted from G matrices that are themselves estimated from patterns of covariance among kin. The existing method thus neglects a major source of uncertainty in G matrices, which renders it anti‐conservative. We first suggest a correction to the method. We then apply the original and modified methods to a very simple instructive scenario. Finally, we demonstrate the use of both methods in the analysis of a real data set.  相似文献   

4.
MOTIVATION: Peptide-sequencing methods by mass spectrum use the following two approaches: database searching and de novo sequencing. The database-searching approach is convenient; however, in cases wherein the corresponding sequences are not included in the databases, the exact identification is difficult. On the other hand, in the case of de novo sequencing, no preliminary information is necessary; however, continuous amino acid sequence peaks and the differentiation of these peaks are required. It is, however, very difficult to obtain and differentiate the peaks of all amino acids by using an actual spectrum. We propose a novel de novo sequencing approach using not only mass-to-charge ratio but also ion peak intensity and amino acid cleavage intensity ratio (CIR). RESULTS: Our method compensates for any undetectable amino acid peak intervals by estimating the amino acid set and the probability of peak expression based on amino acid CIR. It provides more accurate identification of sequences than the existing methods, by which it is usually difficult to sequence.  相似文献   

5.
MOTIVATION: In recent years, advances have been made in the ability of computational methods to discriminate between homologous and non-homologous proteins in the 'twilight zone' of sequence similarity, where the percent sequence identity is a poor indicator of homology. To make these predictions more valuable to the protein modeler, they must be accompanied by accurate alignments. Pairwise sequence alignments are inferences of orthologous relationships between sequence positions. Evolutionary distance is traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. RESULTS: HMMSUM (HMMSTR-based substitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM matrices or structure-based substitution matrices SDM and HSDM when validated against remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method.  相似文献   

6.
MOTIVATION: Comparative metabolic profiling by nuclear magnetic resonance (NMR) is showing increasing promise for identifying inter-individual differences to drug response. Two dimensional (2D) (1)H (13)C NMR can reduce spectral overlap, a common problem of 1D (1)H NMR. However, the peak alignment tools for 1D NMR spectra are not well suited for 2D NMR. An automated and statistically robust method for aligning 2D NMR peaks is required to enable comparative metabonomic analysis using 2D NMR. RESULTS: A novel statistical method was developed to align NMR peaks that represent the same chemical groups across multiple 2D NMR spectra. The degree of local pattern match among peaks in different spectra is assessed using a similarity measure, and a heuristic algorithm maximizes the similarity measure for peaks across the whole spectrum. This peak alignment method was used to align peaks in 2D NMR spectra of endogenous metabolites in liver extracts obtained from four inbred mouse strains in the study of acetaminophen-induced liver toxicity. This automated alignment method was validated by manual examination of the top 50 peaks as ranked by signal intensity. Manual inspection of 1872 peaks in 39 different spectra demonstrated that the automated algorithm correctly aligned 1810 (96.7%) peaks. AVAILABILITY: Algorithm is available upon request.  相似文献   

7.
Serban N 《Biometrics》2007,63(2):531-539
MICE--multiple-peak identification, characterization, and estimation--is a procedure for estimating a lower bound of the number of frequency peaks and for estimating the frequency peak parameters. The leading application is protein structure determination using nuclear magnetic resonance (NMR) experiments. NMR frequency data are multiple-peak data, where each frequency peak corresponds to two connected atoms in the three-dimensional protein structure. We analyze the NMR frequency data through a series of steps: a preliminary step for separating the signal from the background followed by identification of local maxima up to a noise-level-dependent threshold, estimation of the frequency peak parameters using an iterative algorithm, and detection of mixtures of peaks using hypothesis testing.  相似文献   

8.
9.
Leucine zippers are oligomerization domains used in a wide range of proteins. Their structure is based on a highly conserved heptad repeat sequence in which two key positions are occupied by leucines. The leucine zipper of the cell cycle-regulated Nek2 kinase is important for its dimerization and activation. However, the sequence of this leucine zipper is most unusual in that leucines occupy only one of the two hydrophobic positions. The other position, depending on the register of the heptad repeat, is occupied by either acidic or basic residues. Using NMR spectroscopy, we show that this leucine zipper exists in two conformations of almost equal population that exchange with a rate of 17 s(-1). We propose that the two conformations correspond to the two possible registers of the heptad repeat. This hypothesis is supported by a cysteine mutant that locks the protein in one of the two conformations. NMR spectra of this mutant showed the predicted 2-fold reduction of peaks in the (15)N HSQC spectrum and the complete removal of cross peaks in exchange spectra. It is possible that interconversion of these two conformations may be triggered by external signals in a manner similar to that proposed recently for the microtubule binding domain of dynein and the HAMP domain. As a result, the leucine zipper of Nek2 kinase is the first example where the frameshift of coiled-coil heptad repeats has been directly observed experimentally.  相似文献   

10.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

11.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

12.
The heterocyclic aromatic amine, 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP), has been shown to be carcinogenic in rodents, mice and rats. Following phase I N-hydroxylation and phase II esterification PhIP exerts its carcinogenic effect by binding to DNA purines. Quantitative and qualitative analysis of its bioactivated metabolites as well as it detoxification products is important in studying its biological effects and inter- and intra-individual exposures. A review is presented with an extensive coverage of publications specifically reporting on the analysis of PhIP and its phase I and II metabolites in biological matrices, foodstuff and beverages. Analytical techniques such as liquid and gas chromatography coupled with various detection techniques (mass spectrometry, ultraviolet or fluorescence detection) were mostly applied. We conclude that since the initial identification of PhIP in 1986 a large set of assays has been developed for the analysis of PhIP and its phase I and phase II metabolites in a wide range of matrices, these included food products and biological samples such as plasma, urine and faeces. In addition, it was shown that numerous metabolites were recovered and identified. Thus, we conclude that liquid chromatography coupled to mass spectrometry is clearly the method of choice for sensitive qualitative as well as quantitative analysis with high selectivity and reaching lower quantification levels in the sub pg/mL range. The main aim of this review is that it can be used by other researchers as a resource for method development and optimization of analytical methods of PhIP and its carcinogenic or detoxification products.  相似文献   

13.
MOTIVATION: We propose a general method for deriving amino acid substitution matrices from low resolution force fields. Unlike current popular methods, the approach does not rely on evolutionary arguments or alignment of sequences or structures. Instead, residues are computationally mutated and their contribution to the total energy/score is collected. The average of these values over each position within a set of proteins results in a substitution matrix. RESULTS: Example substitution matrices have been calculated from force fields based on different philosophies and their performance compared with conventional substitution matrices. Although this can produce useful substitution matrices, the methodology highlights the virtues, deficiencies and biases of the source force fields. It also allows a rather direct comparison of sequence alignment methods with the score functions underlying protein sequence to structure threading. AVAILABILITY: Example substitution matrices are available from http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html. SUPPLEMENTARY INFORMATION: The list of proteins used for data collection and the optimized parameters for the alignment are given as supplementary material at http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html.  相似文献   

14.
Susko E 《Systematic biology》2011,60(5):668-675
Generalized least squares (GLS) methods provide a relatively fast means of constructing a confidence set of topologies. Because they utilize information about the covariances between distances, it is reasonable to expect additional efficiency in estimation and confidence set construction relative to other least squares (LS) methods. Difficulties have been found to arise in a number of practical settings due to estimates of covariance matrices being ill conditioned or even noninvertible. We present here new ways of estimating the covariance matrices for distances that are much more likely to be positive definite, as the actual covariance matrices are. A thorough investigation of performance is also conducted. An alternative to GLS that has been proposed for constructing confidence sets of topologies is weighted least squares (WLS). As currently implemented, this approach is equivalent to the use of GLS but with covariances set to zero rather than being estimated. In effect, this approach assumes normality of the estimated distances and zero covariances. As the results here illustrate, this assumption leads to poor performance. A 95% confidence set is almost certain to contain the true topology but will contain many more topologies than are needed. On the other hand, the results here also indicate that, among LS methods, WLS performs quite well at estimating the correct topology. It turns out to be possible to improve the performance of WLS for confidence set construction through a relatively inexpensive normal parametric bootstrap that utilizes the same variances and covariances of GLS. The resulting procedure is shown to perform at least as well as GLS and thus provides a reasonable alternative in cases where covariance matrices are ill conditioned.  相似文献   

15.
Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.  相似文献   

16.
Kaur H  Raghava GP 《Proteins》2004,55(1):83-90
In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.  相似文献   

17.
The introduction of new paramagnetic shift reagents in the nuclear magnetic resonance (NMR) method has made it possible to distinguish intra- and extracellular ions in tissues or organs in vitro. We measured the intra- and extracellular 23Na and 1H in vivo in the gerbil brain and skeletal muscle by NMR spectroscopy employing the shift reagent, dysprosium triethylenetetraminehexaacetate (Dy[TTHA]3-). Without Dy(TTHA)3-, the 23Na and 1H signals were seen only as single peaks, but gradual intravenous infusion of Dy(TTHA)3- separated these signals into two peaks, respectively. The unshifted peaks reflected the intracellular 23Na and 1H signals, while the shifted peaks reflected the extracellular signals. In the brain spectra, an additional small peak, which represented intravascular signals, was detected and its intensity increased after injection of papaverine hydrochloride. The present method is advantageous over the microelectrode technique because of its nondestructiveness and its capability for obtaining intra- and extracellular volume information from measurements of the 1H spectra, the peaks of which reflect the intra- and extracellular water amounts. The intracellular Na+ increase associating with increased cellular volume after ouabain in the muscle was clearly visualized by this method. The technique is clearly of use for physiological and pathophysiological studies of organs.  相似文献   

18.
Peak lists derived from nuclear magnetic resonance (NMR) spectra are commonly used as input data for a variety of computer assisted and automated analyses. These include automated protein resonance assignment and protein structure calculation software tools. Prior to these analyses, peak lists must be aligned to each other and sets of related peaks must be grouped based on common chemical shift dimensions. Even when programs can perform peak grouping, they require the user to provide uniform match tolerances or use default values. However, peak grouping is further complicated by multiple sources of variance in peak position limiting the effectiveness of grouping methods that utilize uniform match tolerances. In addition, no method currently exists for deriving peak positional variances from single peak lists for grouping peaks into spin systems, i.e. spin system grouping within a single peak list. Therefore, we developed a complementary pair of peak list registration analysis and spin system grouping algorithms designed to overcome these limitations. We have implemented these algorithms into an approach that can identify multiple dimension-specific positional variances that exist in a single peak list and group peaks from a single peak list into spin systems. The resulting software tools generate a variety of useful statistics on both a single peak list and pairwise peak list alignment, especially for quality assessment of peak list datasets. We used a range of low and high quality experimental solution NMR and solid-state NMR peak lists to assess performance of our registration analysis and grouping algorithms. Analyses show that an algorithm using a single iteration and uniform match tolerances approach is only able to recover from 50 to 80% of the spin systems due to the presence of multiple sources of variance. Our algorithm recovers additional spin systems by reevaluating match tolerances in multiple iterations. To facilitate evaluation of the algorithms, we developed a peak list simulator within our nmrstarlib package that generates user-defined assigned peak lists from a given BMRB entry or database of entries. In addition, over 100,000 simulated peak lists with one or two sources of variance were generated to evaluate the performance and robustness of these new registration analysis and peak grouping algorithms.  相似文献   

19.
We explore the connection between two problems that have arisen independently in the signal processing and related fields: the estimation of the geometric mean of a set of symmetric positive definite (SPD) matrices and their approximate joint diagonalization (AJD). Today there is a considerable interest in estimating the geometric mean of a SPD matrix set in the manifold of SPD matrices endowed with the Fisher information metric. The resulting mean has several important invariance properties and has proven very useful in diverse engineering applications such as biomedical and image data processing. While for two SPD matrices the mean has an algebraic closed form solution, for a set of more than two SPD matrices it can only be estimated by iterative algorithms. However, none of the existing iterative algorithms feature at the same time fast convergence, low computational complexity per iteration and guarantee of convergence. For this reason, recently other definitions of geometric mean based on symmetric divergence measures, such as the Bhattacharyya divergence, have been considered. The resulting means, although possibly useful in practice, do not satisfy all desirable invariance properties. In this paper we consider geometric means of covariance matrices estimated on high-dimensional time-series, assuming that the data is generated according to an instantaneous mixing model, which is very common in signal processing. We show that in these circumstances we can approximate the Fisher information geometric mean by employing an efficient AJD algorithm. Our approximation is in general much closer to the Fisher information geometric mean as compared to its competitors and verifies many invariance properties. Furthermore, convergence is guaranteed, the computational complexity is low and the convergence rate is quadratic. The accuracy of this new geometric mean approximation is demonstrated by means of simulations.  相似文献   

20.
Poland D 《Biopolymers》2001,58(1):89-105
Experimental data on the temperature dependence of the heat capacity of proteins can be used to calculate approximate enthalpy distributions for these molecules using the maximum-entropy method. C(p) (T) data is first used to calculate a set of moments of the enthalpy distribution, and these are then used to estimate the enthalpy distribution. If one knows the temperature expansion of the heat capacity through the (n - 2)th power of DeltaT (measured from the expansion center), then this is enough information to calculate the nth moment of the enthalpy distribution. Using four or more moments is in turn enough information to resolve bimodal behavior in the distribution. If the enthalpy distribution of a protein exhibits two distinct peaks, then this is direct experimental confirmation of a two-state mechanism of denaturation, the two peaks corresponding to the enthalpy of the native and unfolded species respectively. If the heat capacity of a protein exhibits a maximum at the denaturation temperature, then there is the possibility that the enthalpy distribution will be bimodal, but the presence of a maximum in the heat capacity is not a sufficient condition for this kind of behavior. We construct a phase diagram in terms of the appropriate variables to indicate when a maximum in the heat capacity will also give rise to bimodal behavior in the enthalpy distribution. We illustrate the phase diagram using literature data for a set of proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号