首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Introduction Predicting the native structure of a protein from its amino acid sequence is one of the most challeng- ing problems in biophysics and bioinformatics. The difficulty of the problem comes from two aspects. One is the determination of the potential energy func- tion. The effective energy function can generally dis- tinguish the native states from non-native states of protein molecules. The other is that the potential en- ergy landscape of the system can be characterized by a multitu…  相似文献   

2.
Protein classification with imbalanced data   总被引:1,自引:0,他引:1  
Zhao XM  Li X  Chen L  Aihara K 《Proteins》2008,70(4):1125-1132
Generally, protein classification is a multi-class classification problem and can be reduced to a set of binary classification problems, where one classifier is designed for each class. The proteins in one class are seen as positive examples while those outside the class are seen as negative examples. However, the imbalanced problem will arise in this case because the number of proteins in one class is usually much smaller than that of the proteins outside the class. As a result, the imbalanced data cause classifiers to tend to overfit and to perform poorly in particular on the minority class. This article presents a new technique for protein classification with imbalanced data. First, we propose a new algorithm to overcome the imbalanced problem in protein classification with a new sampling technique and a committee of classifiers. Then, classifiers trained in different feature spaces are combined together to further improve the accuracy of protein classification. The numerical experiments on benchmark datasets show promising results, which confirms the effectiveness of the proposed method in terms of accuracy. The Matlab code and supplementary materials are available at http://eserver2.sat.iis.u-tokyo.ac.jp/ approximately xmzhao/proteins.html.  相似文献   

3.
Because the human immunodeficiency virus type 1 protease (HIV-1-PR) is an essential enzyme in the viral life cycle, its inhibition can control AIDS. The folding of single-domain proteins, like each of the monomers forming the HIV-1-PR homodimer, is controlled by local elementary structures (LES, folding units stabilized by strongly interacting, highly conserved, as a rule hydrophobic, amino acids). These LES have evolved over myriad generations to recognize and strongly attract each other, so as to make the protein fold fast and be stable in its native conformation. Consequently, peptides displaying a sequence identical to those segments of the monomers associated with LES are expected to act as competitive inhibitors and thus destabilize the native structure of the enzyme. These inhibitors are unlikely to lead to escape mutants as they bind to the protease monomers through highly conserved amino acids, which play an essential role in the folding process. The properties of one of the most promising inhibitors of the folding of the HIV-1-PR monomers found among these peptides are demonstrated with the help of spectrophotometric assays and circular dichroism spectroscopy.  相似文献   

4.
Yeh CW  Chu CP  Wu KR 《Bio Systems》2006,83(1):56-66
Binary optimization is a widely investigated topic in integer linear programming. This study proposes a DNA-based computing algorithm for solving the significantly large binary integer programming (BIP) problem. The proposed approach is based upon Adleman and Lipton's DNA operations to solve the BIP problem. The potential of DNA computation for the BIP problem is promising given the operational time complexity of O(nxk).  相似文献   

5.
Computer simulations of simple exact lattice models are an aid in the study of protein folding process; they have sometimes resulted in predictions experimentally proved. The contact interactions (CI) method is here proposed as a new algorithm for the conformational search in the low-energy regions of protein chains modeled as copolymers of hydrophobic and polar monomers configured as self-avoiding walks on square or cubic lattices. It may be regarded as an extension of the standard Monte Carlo method improved by the concept of cooperativity deriving from nonlocal contact interactions. A major difference with respect to other algorithms is that criteria for the acceptance of new conformations generated during the simulations are not based on the energy of the entire molecule, but cooling factors associated with each residue define regions of the model protein with higher or lower mobility. Nine sequences of length ranging from 20 to 64 residues were used on the square lattice and 15 sequences of length ranging from 46 to 136 residues were used on the cubic lattice. The CI algorithm proved very efficient both in two and three dimensions, and allowed us to localize energy minima not localized by other searching algorithms described in the literature. Use of this algorithm is not limited to the conformational search, because it allows the exploration of thermodynamic and kinetic behavior of model protein chains.  相似文献   

6.
Prediction of the three-dimensional structure of a protein from its amino acid sequence can be considered as a global optimization problem. In this paper, the Chaotic Artificial Bee Colony (CABC) algorithm was introduced and applied to 3D protein structure prediction. Based on the 3D off-lattice AB model, the CABC algorithm combines global search and local search of the Artificial Bee Colony (ABC) algorithm with the chaotic search algorithm to avoid the problem of premature convergence and easily trapping the local optimum solution. The experiments carried out with the popular Fibonacci sequences demonstrate that the proposed algorithm provides an effective and high-performance method for protein structure prediction.  相似文献   

7.
We present two parameterized algorithms for the closest string problem. The first runs in O(nL + nd · 17.97d) time for DNA strings and in O(nL + nd · 61.86d) time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in O(nL + nd · 13.92d) time for DNA strings and in O(nL + nd · 47.21d) time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in O((n - 1)m2(L + d · 17.97d · m[log2(d+1)])) time for DNA strings and in O((n - 1)m2(L + d · 61.86d · m[log2(d+1)])) time for protein strings, where n is the number of input strings, L is the length of the center substring, L - 1 + m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted (L, d)-motif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too.  相似文献   

8.
The binding equilibrium of deuteroporphyrin IX to human serum albumin and to bovine serum albumin was studied, by monitoring protein-induced changes in the porphyrin fluorescence and taking into consideration the self-aggregation of the porphyrin. To have control over the latter, the range of porphyrin concentrations was chosen to maker dimers (non-covalent) the dominant aggregate. Each protein was found to have one high-affinity site for deuteroporphyrin IX monomers, the magnitudes of the equilibrium binding constants (25 degrees C, neutral pH, phosphate-buffered saline) being 4.5 (+/- 1.5) X 10(7) M-1 and 1.7 (+/- 0.2) X 10(6) M-1 for human serum albumin and for bovine serum albumin respectively. Deuteroporphyrin IX dimers were found to bind directly to the protein, each protein binding one dimer, with high affinity. Two models are proposed for the protein-binding of porphyrin monomers and dimers in a porphyrin system having both species: a competitive model, where each protein molecule has only one binding site, which can be occupied by either a monomer or a dimer; a non-competitive model, where each protein molecule has two binding sites, one for monomers and one for dimers. On testing the fit of the data to the models, an argument can be made to favour the non-competitive model, the equilibrium binding constants of the dimers, for the non-competitive model (25 degrees C, neutral pH, phosphate-buffered saline), being: 8.0 (+/- 1.8) X 10(8) M-1 and 1.2 (+/- 0.6) X 10(7) M-1 for human serum albumin and bovine serum albumin respectively.  相似文献   

9.
Predicting protein quaternary structure by pseudo amino acid composition   总被引:1,自引:0,他引:1  
Chou KC  Cai YD 《Proteins》2003,53(2):282-289
In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, that associate through noncovalent interactions and, occasionally, disulfide bonds. With the number of protein sequences entering into data banks rapidly increasing, we are confronted with a challenge: how to develop an automated method to identify the quaternary attribute for a new polypeptide chain (i.e., whether it is formed just as a monomer, or as a dimer, trimer, or any other oligomer). This is important, because the functions of proteins are closely related to their quaternary attribute. For example, some critical ligands only bind to dimers but not to monomers; some marvelous allosteric transitions only occur in tetramers but not other oligomers; and some ion channels are formed by tetramers, whereas others are formed by pentamers. To explore this problem, we adopted the pseudo amino acid composition originally proposed for improving the prediction of protein subcellular location (Chou, Proteins, 2001; 43:246-255). The advantage of using the pseudo amino acid composition to represent a protein is that it has paved a way that can take into account a considerable amount of sequence-order effects to significantly improve prediction quality. Results obtained by resubstitution, jack-knife, and independent data set tests, have indicated that the current approach might be quite promising in dealing with such an extremely complicated and difficult problem.  相似文献   

10.
A microscopic, reversible model to study protein crystal nucleation and growth is presented. The probability of monomer attachment to the growing crystal was assumed to be proportional to the protein volume fraction and the orientational factor representing the anisotropy of protein molecules. The rate of detachment depended on the free energy of association of the given monomer in the lattice, as calculated from the buried surface area. The proposed algorithm allowed the simulation of the process of crystal growth from free monomers to complexes having 10(5) molecules, i.e. microcrystals with already formed faces. These simulations correctly reproduced the crystal morphology of the chosen model system--the tetragonal lysozyme crystal. We predicted the critical size, after which the growth rate rapidly increased to approximately 50 protein monomers. The major factors determining the protein crystallisation kinetics were the geometry of the protein molecules and the resulting number of kinetics traps on the growth pathway.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号