期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

31.

Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition 总被引：2，自引：0，他引：2

Shen HB Chou KC 《Biochemical and biophysical research communications》2005,337(3):752-756

The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen. 相似文献

32.

Fidelity Estimation for a Hierarchical Classifier

P. Hufnagl K. Voss 《Biometrical journal. Biometrische Zeitschrift》1985,27(6):659-667

To estimate the correct classification rate of a classifier, many different methods exist (test sample, bootstrap, cross validation). The test sample is a method with very small expense. Sometimes, only a small number of objects is available (seldom diseases, high costs for experiments). When we split the sample in training set and test set, we get good or bad fidelity estimations but, unfortunately, vice versa a big or small confidence interval for the estimation. Overcoming this dilemma is only possible for simple classifiers. Such a simple classifier is investigated and a direct fidelity estimation is proposed. 相似文献

33.

Hill’s small systems nanothermodynamics: a simple macromolecular partition problem with a statistical perspective

Hong Qian 《Journal of biological physics》2012,38(2):201-207

Using a simple example of biological macromolecules which are partitioned between bulk solution and membrane, we investigate T.L. Hill’s phenomenological nanothermodynamics for small systems. By introducing a system size-dependent equilibrium constant for the bulk-membrane partition, we obtain Hill’s results on differential and integral chemical potentials μ and [^(m)]\hat{\mu} from computations based on standard Gibbsian equilibrium statistical mechanics. It is shown that their difference can be understood from an equilibrium re-partitioning between bulk and membrane fractions upon a change in the system’s size; it is closely related to the system’s fluctuations and inhomogeneity. These results provide a better understanding of nanothermodynamics and clarify its logical relation with the theory of statistical mechanics. 相似文献

34.

Ensemble Modeling for Robustness Analysis in engineering non-native metabolic pathways

《Metabolic engineering》2014

Metabolic pathways in cells must be sufficiently robust to tolerate fluctuations in expression levels and changes in environmental conditions. Perturbations in expression levels may lead to system failure due to the disappearance of a stable steady state. Increasing evidence has suggested that biological networks have evolved such that they are intrinsically robust in their network structure. In this article, we presented Ensemble Modeling for Robustness Analysis (EMRA), which combines a continuation method with the Ensemble Modeling approach, for investigating the robustness issue of non-native pathways. EMRA investigates a large ensemble of reference models with different parameters, and determines the effects of parameter drifting until a bifurcation point, beyond which a stable steady state disappears and system failure occurs. A pathway is considered to have high bifurcational robustness if the probability of system failure is low in the ensemble. To demonstrate the utility of EMRA, we investigate the bifurcational robustness of two synthetic central metabolic pathways that achieve carbon conservation: non-oxidative glycolysis and reverse glyoxylate cycle. With EMRA, we determined the probability of system failure of each design and demonstrated that alternative designs of these pathways indeed display varying degrees of bifurcational robustness. Furthermore, we demonstrated that target selection for flux improvement should consider the trade-offs between robustness and performance. 相似文献

35.

Structural Insights into the Unique Modes of Relaxin-Binding and Tethered-Agonist Mediated Activation of RXFP1 and RXFP2

《Journal of molecular biology》2021,433(21):167217

Our poor understanding of the mechanism by which the peptide-hormone H2 relaxin activates its G protein coupled receptor, RXFP1 and the related receptor RXFP2, has hindered progress in its therapeutic development. Both receptors possess large ectodomains, which bind H2 relaxin, and contain an N-terminal LDLa module that is essential for receptor signaling and postulated to be a tethered agonist. Here, we show that a conserved motif (GDxxGWxxxF), C-terminal to the LDLa module, is critical for receptor activity. Importantly, this motif adopts different structures in RXFP1 and RXFP2, suggesting distinct activation mechanisms. For RXFP1, the motif is flexible, weakly associates with the LDLa module, and requires H2 relaxin binding to stabilize an active conformation. Conversely, the GDxxGWxxxF motif in RXFP2 is more closely associated with the LDLa module, forming an essential binding interface for H2 relaxin. These differences in the activation mechanism will aid drug development targeting these receptors. 相似文献

36.

A 23-gene prognostic classifier for prediction of recurrence and survival for Asian breast cancer patients

Ting-Hao Chen Jian-Ying Chiu Kuan-Hui Shih 《Bioscience reports》2020,40(12)

We report a 23- gene-classifier profiled from Asian women, with the primary purpose of assessing its clinical utility towards improved risk stratification for relapse for breast cancer patients from Asian cohorts within 10 years’ following mastectomy. Four hundred and twenty-two breast cancer patients underwent mastectomy and were used to train the classifier on a logistic regression model. A subset of 197 patients were chosen to be entered into the follow-up studies post mastectomy who were examined to determine the patterns of recurrence and survival analysis based on gene expression of the gene classifier, age at diagnosis, tumor stage and lymph node status, over a 5 and 10 years follow-up period. Metastasis to lymph node (N2-N3) with N0 as the reference (N2 vs. N0 hazard ratio: 2.02 (1.05–8.70), N3 vs. N0 hazard ratio: 4.32 (1.41–13.22) for 5 years) and gene expression of the 23-gene panel (P=0.06, 5 years and 0.02, 10 years, log-rank test) were found to have significant discriminatory effects on the risk of relapse (HR (95%CI):2.50 (0.95–6.50)). Furthermore, survival curves for subgroup analysis with N0-N1 and T1-T2 predicted patients with higher risk scores. The study provides robust evidence of the effectiveness of the 23-gene-classifier and could be used to determine the risk of relapse event (locoregional and distant recurrence) in Asian patients, leading to a meaningful reduction in chemotherapy recommendations. 相似文献

37.

nDNA-prot: identification of DNA-binding proteins based on unbalanced classification

Li Song Dapeng Li Xiangxiang Zeng Yunfeng Wu Li Guo Quan Zou 《BMC bioinformatics》2014,15(1)

Background

DNA-binding proteins are vital for the study of cellular processes. In recent genome engineering studies, the identification of proteins with certain functions has become increasingly important and needs to be performed rapidly and efficiently. In previous years, several approaches have been developed to improve the identification of DNA-binding proteins. However, the currently available resources are insufficient to accurately identify these proteins. Because of this, the previous research has been limited by the relatively unbalanced accuracy rate and the low identification success of the current methods.

Results

In this paper, we explored the practicality of modelling DNA binding identification and simultaneously employed an ensemble classifier, and a new predictor (nDNA-Prot) was designed. The presented framework is comprised of two stages: a 188-dimension feature extraction method to obtain the protein structure and an ensemble classifier designated as imDC. Experiments using different datasets showed that our method is more successful than the traditional methods in identifying DNA-binding proteins. The identification was conducted using a feature that selected the minimum Redundancy and Maximum Relevance (mRMR). An accuracy rate of 95.80% and an Area Under the Curve (AUC) value of 0.986 were obtained in a cross validation. A test dataset was tested in our method and resulted in an 86% accuracy, versus a 76% using iDNA-Prot and a 68% accuracy using DNA-Prot.

Conclusions

Our method can help to accurately identify DNA-binding proteins, and the web server is accessible at http://datamining.xmu.edu.cn/~songli/nDNA. In addition, we also predicted possible DNA-binding protein sequences in all of the sequences from the UniProtKB/Swiss-Prot database.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-298) contains supplementary material, which is available to authorized users. 相似文献

38.

Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys

Jeffrey J Werner Omry Koren Philip Hugenholtz Todd Z DeSantis William A Walters J Gregory Caporaso Largus T Angenent Rob Knight Ruth E Ley 《The ISME journal》2012,6(1):94-103

Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases. 相似文献

39.

Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location

Jiang X Wei R Zhao Y Zhang T 《Amino acids》2008,34(4):669-675

The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author. 相似文献

40.

Optimized intermolecular potential for nitriles based on Anisotropic United Atoms model

Hadj-Kali MK Gerbaud V Joulia X Lacaze-Dufaure C Mijoule C Ungerer P 《Journal of molecular modeling》2008,14(7):571-580

An extension of the anisotropic united atoms intermolecular potential model is proposed for nitriles. The electrostatic part of the intermolecular potential is calculated using atomic charges obtained by a simple Mulliken population analysis. The repulsion-dispersion interaction parameters for methyl and methylene groups are taken from transferable AUA4 literature parameters [Ungerer et al., J. Chem. Phys., 2000, 112, 5499]. Non-bonding Lennard-Jones intermolecular potential parameters are regressed for the carbon and nitrogen atoms of the nitrile group (–C≡N) from experimental vapor-liquid equilibrium data of acetonitrile. Gibbs Ensemble Monte Carlo simulations and experimental data agreement is very good for acetonitrile, and better than previous molecular potential proposed by Hloucha et al. [J. Chem. Phys., 2000, 113, 5401]. The transferability of the resulting potential is then successfully tested, without any further readjustment, to predict vapor-liquid phase equilibrium of propionitrile and n-butyronitrile. Figure Saturated vapour pressure of nitriles calculated in this work by molecular simulation compared to experimental data: a) for acetonitrile and b) for both propionitrile and butyronitrile 相似文献