首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Despite significant successes in structure‐based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest‐energy structures and sequences are found. DEE/A*‐based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap‐free list of low‐energy protein conformations, which is necessary for ensemble‐based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*‐based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs. Proteins 2015; 83:1859–1877. © 2015 Wiley Periodicals, Inc.  相似文献   

2.
In recent years, massive sequencing approaches have allowed us to determine genomic structures of various organisms rapidly, raising novel applicability of the high-throughput sequence data obtained to various fields of biological studies. We present here a pipeline to search for microsatellite DNA and design PCR primers encompassing the microsatellites on genomic sequence data produced by 454 pyrosequencing. We tested this pipeline, called ‘Auto-primer’, on several fish genomic sequences and obtained many and various candidates for microsatellite DNA loci useful for detecting intraspecies genetic variability. This in silico search for microsatellite DNA is superior to conventional cloning methods, since any sequence patterns of repeat unit can be screened.  相似文献   

3.
Recent progress in predicting protein sub-subcellular locations   总被引:1,自引:0,他引:1  
In the last two decades, the number of the known protein sequences increased very rapidly. However, a knowledge of protein function only exists for a small portion of these sequences. Since the experimental approaches for determining protein functions are costly and time consuming, in silico methods have been introduced to bridge the gap between knowledge of protein sequences and their functions. Knowing the subcellular location of a protein is considered to be a critical step in understanding its biological functions. Many efforts have been undertaken to predict the protein subcellular locations in silico. With the accumulation of available data, the substructures of some subcellular organelles, such as the cell nucleus, mitochondria and chloroplasts, have been taken into consideration by several studies in recent years. These studies create a new research topic, namely 'protein sub-subcellular location prediction', which goes one level deeper than classic protein subcellular location prediction.  相似文献   

4.
Bioinformatics tools for identifying class I-restricted epitopes   总被引:4,自引:0,他引:4  
The lack of simple methods to identify relevant T-cell epitopes, the high mutation rate of many pathogens, and restriction of T-cell response to epitopes due to human lymphocyte antigen (HLA) polymorphism have significantly hindered the development of cytotoxic T-lymphocyte (CTL) epitope-based or "epitope-driven" vaccines. Previously, CTL epitopes were mapped using large arrays of overlapping synthetic peptides. The large number of protein sequences available for mapping is now making this method prohibitively expensive and time-consuming. Bioinformatics tools such as EpiMatrix and Conservatrix, which search for unique or multi-HLA-restricted (promiscuous) T-cell epitopes and identify epitopes that are conserved across variant strains of the same pathogen, accelerate epitope mapping. These tools offer a significant advantage over other methods of epitope selection because high-throughput screening can be performed in silico, followed by confirmatory studies in vitro. CTL epitopes discovered using these tools might be used to develop novel vaccines and therapeutics for the prevention and treatment of infectious diseases such as human immunodeficiency virus, hepatitis C, tuberculosis, and some cancers.  相似文献   

5.
The rational design of loops and turns is a key step towards creating proteins with new functions. We used a computational design procedure to create new backbone conformations in the second turn of protein L. The Protein Data Bank was searched for alternative turn conformations, and sequences optimal for these turns in the context of protein L were identified using a Monte Carlo search procedure and an energy function that favors close packing. Two variants containing 12 and 14 mutations were found to be as stable as wild-type protein L. The crystal structure of one of the variants has been solved at a resolution of 1.9 A, and the backbone conformation in the second turn is remarkably close to that of the in silico model (1.1 A RMSD) while it differs significantly from that of wild-type protein L (the turn residues are displaced by an average of 7.2 A). The folding rates of the redesigned proteins are greater than that of the wild-type protein and in contrast to wild-type protein L the second beta-turn appears to be formed at the rate limiting step in folding.  相似文献   

6.
MOTIVATION: Due to the recent advances in technology of mass spectrometry, there has been an exponential increase in the amount of data being generated in the past few years. Database searches have not been able to keep with this data explosion. Thus, speeding up the data searches becomes increasingly important in mass-spectrometry-based applications. Traditional database search methods use one-against-all comparisons of a query spectrum against a very large number of peptides generated from in silico digestion of protein sequences in a database, to filter potential candidates from this database followed by a detailed scoring and ranking of those filtered candidates. RESULTS: In this article, we show that we can avoid the one-against-all comparisons. The basic idea is to design a set of hash functions to pre-process peptides in the database such that for each query spectrum we can use the hash functions to find only a small subset of peptide sequences that are most likely to match the spectrum. The construction of each hash function is based on a random spectrum and the hash value of a peptide is the normalized shared peak counts score (cosine) between the random spectrum and the hypothetical spectrum of the peptide. To implement this idea, we first embed each peptide into a unit vector in a high-dimensional metric space. The random spectrum is represented by a random vector, and we use random vectors to construct a set of hash functions called locality sensitive hashing (LSH) for preprocessing. We demonstrate that our mapping is accurate. We show that our method can filter out >95.65% of the spectra without missing any correct sequences, or gain 111 times speedup by filtering out 99.64% of spectra while missing at most 0.19% (2 out of 1014) of the correct sequences. In addition, we show that our method can be effectively used for other mass spectra mining applications such as finding clusters of spectra efficiently and accurately. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

7.
Combinatorial protein libraries permit the examination of a wide range of sequences. Such methods are being used for denovo design and to investigate the determinants of protein folding. The exponentially large number of possible sequences, however, necessitates restrictions on the diversity of sequences in a combinatorial library. Recently, progress has been made in developing theoretical tools to bias and characterize the ensemble of sequences that fold into a given structure - tools that can be applied to the design and interpretation of combinatorial experiments.  相似文献   

8.
The creation of novel enzymes capable of catalyzing any desired chemical reaction is a grand challenge for computational protein design. Here we describe two new algorithms for enzyme design that employ hashing techniques to allow searching through large numbers of protein scaffolds for optimal catalytic site placement. We also describe an in silico benchmark, based on the recapitulation of the active sites of native enzymes, that allows rapid evaluation and testing of enzyme design methodologies. In the benchmark test, which consists of designing sites for each of 10 different chemical reactions in backbone scaffolds derived from 10 enzymes catalyzing the reactions, the new methods succeed in identifying the native site in the native scaffold and ranking it within the top five designs for six of the 10 reactions. The new methods can be directly applied to the design of new enzymes, and the benchmark provides a powerful in silico test for guiding improvements in computational enzyme design.  相似文献   

9.
Flexibility and dynamics are important for protein function and a protein's ability to accommodate amino acid substitutions. However, when computational protein design algorithms search over protein structures, the allowed flexibility is often reduced to a relatively small set of discrete side‐chain and backbone conformations. While simplifications in scoring functions and protein flexibility are currently necessary to computationally search the vast protein sequence and conformational space, a rigid representation of a protein causes the search to become brittle and miss low‐energy structures. Continuous rotamers more closely represent the allowed movement of a side chain within its torsional well and have been successfully incorporated into the protein design framework to design biomedically relevant protein systems. The use of continuous rotamers in protein design enables algorithms to search a larger conformational space than previously possible, but adds additional complexity to the design search. To design large, complex systems with continuous rotamers, new algorithms are needed to increase the efficiency of the search. We present two methods, PartCR and HOT, that greatly increase the speed and efficiency of protein design with continuous rotamers. These methods specifically target the large errors in energetic terms that are used to bound pairwise energies during the design search. By tightening the energy bounds, additional pruning of the conformation space can be achieved, and the number of conformations that must be enumerated to find the global minimum energy conformation is greatly reduced. Proteins 2015; 83:1151–1164. © 2015 Wiley Periodicals, Inc.  相似文献   

10.
11.
We present here the estimation of the upper limit of the number of molecular targets in the human genome that represent an opportunity for further therapeutic treatment. We select around ∼6300 human proteins that are similar to sequences of known protein targets collected from DrugBank database. Our bioinformatics study estimates the size of ‘druggable’ human genome to be around 20% of human proteome, i.e. the number of the possible protein targets for small-molecule drug design in medicinal chemistry. We do not take into account any toxicity prediction, the three-dimensional characteristics of the active site in the predicted ‘druggable’ protein families, or detailed chemical analysis of known inhibitors/drugs. Instead we rely on remote homology detection method Meta-BASIC, which is based on sequence and structural similarity. The prepared dataset of all predicted protein targets from human genome presents the unique opportunity for developing and benchmarking various in silico chemo/bio-informatics methods in the context of the virtual high throughput screening.  相似文献   

12.
Angiotensin I converting enzyme (ACE) inhibitory peptides can induce antihypertensive effects after oral administration. By means of an ACE inhibitory peptide database, containing about 500 reported sequences and their IC(50) values, the different proteins in pea and whey were quantitatively evaluated as precursors for ACE inhibitory peptides. This analysis was combined with experimental data from the evolution in ACE inhibitory activity and protein degradation during in vitro gastrointestinal digestion. Pea proteins produced similar in silico scores and were degraded early in the in vitro digestion. High ACE inhibitory activity was observed after the simulated stomach phase and augmented slightly in the simulated small intestine phase. The major whey protein beta-lactoglobulin obtained the highest in silico scores, which corresponded with the fact that degradation of this protein in vitro only occurred from the simulated small intestine phase on and resulted in a 10-fold increase in the ACE inhibitory activity. Whey protein obtained total in silico scores of about 124 ml/mg, compared to 46 ml/mg for pea protein, indicating that whey protein would be a richer source of ACE inhibitory peptides than pea protein. Although beta-lactoglobulin is only partially digested, a higher ACE inhibitory activity was indeed found in the whey (IC(50) = 0.048 mg/ml) compared to the pea digest (IC(50) = 0.076 mg/ml). In silico gastrointestinal digestion of the highest scoring proteins in pea and whey, vicilin and albumin PA2, and beta-lactoglobulin, respectively, directly released a number of potent ACE inhibitory peptides. Several other ACE inhibitory sequences resisted in silico digestion by gastrointestinal proteases. Briefly, the quantitative in silico analysis will facilitate the study of precursor proteins on a large scale and the specific release of bioactive peptides.  相似文献   

13.
The currently available body of decoded amino acid sequences of various proteins exceeds manifold the experimental capabilities of their functional annotation. Therefore, in silico annotation using bioinformatics methods becomes increasingly important. Such annotation is actually a prediction; however, this can be an important starting point for further laboratory research. This work describes a new method for predicting functionally important protein sites, SDPsite, on the basis of identification of specificity determinants. The algorithm proposed utilizes a protein family aglinment and a phylogenetic tree to predict the conserved positions and specificity determinants, map them onto the protein structure, and search for clusters of the predicted positions. Comparison of the resulting predictions with experimental data and published predictions of functional sites by other methods demonstrates that the results of SDPsite agree well with experimental data and exceed the results obtained with the majority of previous methods. SDPsite is publicly available at http://bioinf.fbb.msu.ru/SDPsite.  相似文献   

14.
Computer-aided model-building strategies for protein design   总被引:5,自引:0,他引:5  
C O Pabo  E G Suchanek 《Biochemistry》1986,25(20):5987-5991
Model-building strategies for protein modification and design are developed. These strategies emphasize simple geometric aspects of protein structure, use local coordinate systems defined at particular residues, and systematically consider a large number of alternative sequences and conformations. We have written a computer program, PROTEUS, to implement these search methods. PROTEUS has been used to find positions where disulfide bonds could be added to the N-terminal domain of the lambda repressor and to predict how a loop on the surface of repressor could be shortened.  相似文献   

15.
16.
Ralstonia solanacerum and Clavibacter michiganensis subsp. sepedonicus are the two most relevant bacterial pathogens of potato for which a large number of molecular diagnostic methods using specific DNA sequences have been developed. About one hundred oligonucleotides have been described and thoroughly tested experimentally. After having compiled and evaluated all these primers and probes in silico to check their specificity, many discrepancies were found. A detailed analysis permitted the recognition of different possible reasons for such discrepancies: sequencing errors in public sequences, wrong supposed specificity (sometimes due to more recent sequences than the oligonucleotides being evaluated) or even typing errors in the oligonucleotides. Although this study is an exercise about in silico evaluation using two potato bacterial pathogens as a model, the conclusions reflect not only information useful for phytopathologists but, in a broader scope, draw the main situations that can be found during an evaluation of probes, which can be surely found in other scenarios.  相似文献   

17.
An in silico protein model based on the Kauffman NK-landscape, where N is the number of variable positions in a protein and K is the degree of coupling between variable positions, was used to compare alternative search strategies for directed evolution. A simple genetic algorithm (GA) was used to model the performance of a standard DNA shuffling protocol. The search effectiveness of the GA was compared to that of a statistical approach called the protein sequence activity relationship (ProSAR) algorithm, which consists of two steps: model building and library design. A number of parameters were investigated and found to be important for the comparison, including the value of K, the screening size, the system noise and the number of replicates. The statistical model was found to accurately predict the measured activities for small values of the coupling between amino acids, K 相似文献   

18.
Ex vivo stability is a valuable protein characteristic but is laborious to improve experimentally. In addition to biopharmaceutical and industrial applications, stable protein is important for biochemical and structural studies. Taking advantage of the large number of available genomic sequences and growth temperature data, we present two bioinformatic methods to identify a limited set of amino acids or positions that likely underlie thermostability. Because these methods allow thousands of homologs to be examined in silico, they have the advantage of providing both speed and statistical power. Using these methods, we introduced, via mutation, amino acids from thermoadapted homologs into an exemplar mesophilic membrane protein, and demonstrated significantly increased thermostability while preserving protein activity.  相似文献   

19.
The construction of fitness landscape has broad implication in understanding molecular evolution, cellular epigenetic state, and protein structures. We studied the problem of constructing fitness landscape of inverse protein folding or protein design, with the aim to generate amino acid sequences that would fold into an a priori determined structural fold which would enable engineering novel or enhanced biochemistry. For this task, an effective fitness function should allow identification of correct sequences that would fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be constructed using a rectangular kernel with a basis set of proteins and decoys chosen a priori. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys via a finite Newton method. A blind test of a simplified version of fitness function for sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct other types of fitness landscape.  相似文献   

20.
Computational design of protein-ligand interfaces finds optimal amino acid sequences within a small-molecule binding site of a protein for tight binding of a specific small molecule. It requires a search algorithm that can rapidly sample the vast sequence and conformational space, and a scoring function that can identify low energy designs. This review focuses on recent advances in computational design methods and their application to protein-small molecule binding sites. Strategies for increasing affinity, altering specificity, creating broad-spectrum binding, and building novel enzymes from scratch are described. Future prospects for applications in drug development are discussed, including limitations that will need to be overcome to achieve computational design of protein therapeutics with novel modes of action.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号