共查询到20条相似文献,搜索用时 15 毫秒
1.
Óscar Gallardo David Ovelleiro Marina Gay Montserrat Carrascal Joaquin Abian 《Proteomics》2014,14(20):2275-2279
We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front‐end graphical user interface that combines several Thermo RAW formats to MASCOT? Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. 相似文献
2.
何冰宋晓峰 《现代生物医学进展》2012,12(18):3573-3576
泛素化是目前广受关注的一种翻译后修饰过程,对蛋白质降解、DNA修复等多种细胞过程都具有重要的调控作用。本文根据国内外蛋白质泛素化位点预测的研究,分析了预测泛素化位点的特征属性,总结了对这些特征进行优化的特征选择方法,并对预测过程中所使用的各种机器学习分类器进行了概述。 相似文献
3.
《IRBM》2020,41(4):229-239
Feature selection algorithms are the cornerstone of machine learning. By increasing the properties of the samples and samples, the feature selection algorithm selects the significant features. The general name of the methods that perform this function is the feature selection algorithm. The general purpose of feature selection algorithms is to select the most relevant properties of data classes and to increase the classification performance. Thus, we can select features based on their classification performance. In this study, we have developed a feature selection algorithm based on decision support vectors classification performance. The method can work according to two different selection criteria. We tested the classification performances of the features selected with P-Score with three different classifiers. Besides, we assessed P-Score performance with 13 feature selection algorithms in the literature. According to the results of the study, the P-Score feature selection algorithm has been determined as a method which can be used in the field of machine learning. 相似文献
4.
Novel and improved computational tools are required to transform large-scale proteomics data into valuable information of biological relevance. To this end, we developed ProteoConnections, a bioinformatics platform tailored to address the pressing needs of proteomics analyses. The primary focus of this platform is to organize peptide and protein identifications, evaluate the quality of the acquired data set, profile abundance changes, and accelerate data interpretation. Peptide and protein identifications are stored into a relational database to facilitate data mining and to evaluate the quality of data sets using graphical reports. We integrated databases of known PTMs and other bioinformatics tools to facilitate the analysis of phosphoproteomics data sets and to provide insights for subsequent biological validation experiments. Phosphorylation sites are also annotated according to kinase consensus motifs, contextual environment, protein domains, binding motifs, and evolutionary conservation across different species. The practical application of ProteoConnections is further demonstrated for the analysis of the phosphoproteomics data sets from rat intestinal IEC-6 cells where we identified 9615 phosphorylation sites on 2108 phosphoproteins. Combined proteomics and bioinformatics analyses revealed valuable biological insights on the regulation of phosphoprotein functions via the introduction of new binding sites on scaffold proteins or the modulation of protein-protein, protein-DNA, or protein-RNA interactions. Quantitative proteomics data can be integrated into ProteoConnections to determine the changes in protein phosphorylation under different cell stimulation conditions or kinase inhibitors, as demonstrated here for the MEK inhibitor PD184352. 相似文献
5.
Huang T Zhang J Xu ZP Hu LL Chen L Shao JL Zhang L Kong XY Cai YD Chou KC 《Biochimie》2012,94(4):1017-1025
Longevity is one of the most basic and one of the most essential properties of all living organisms. Identification of genes that regulate longevity would increase understanding of the mechanisms of aging, so as to help facilitate anti-aging intervention and extend the life span. In this study, based on the network features and the biochemical/physicochemical features of the deletion network and deletion genes, as well as their functional features, a two-layer model was developed for predicting the deletion effects on yeast longevity. The first stage of our prediction approach was to identify whether the deletion of one gene would change the life span of yeast; if it did, the second stage of our procedure would automatically proceed to predict whether the deletion of one gene would increase or decrease the life span. It was observed by analyzing the predicted results that the functional features (such as mitochondrial function and chromatin silencing), the network features (such as the edge density and edge weight density of the deletion network), and the local centrality of deletion gene, would have important impact for predicting the deletion effects on longevity. It is anticipated that our model may become a useful tool for studying longevity from the angle of genes and networks. Moreover, it has not escaped our notice that, after some modification, the current model can also be used to study many other phenotype prediction problems from the angle of systems biology. 相似文献
6.
7.
A variety of water quality indices have been used to assess the state of waterbodies all over the world. In calculating a Water Quality Index (WQI), traditional methods require the evaluation of many water quality parameters, making them costly and time-consuming. In recent years, machine learning (ML) algorithms have emerged as an effective tool to solve many environmental problems, including water quality management. In this study, we investigate the performance of the ML-based method in calculating the WQI. We apply several feature selection techniques to select the key parameters fed the ML models. Experiments are carried out to evaluate the WQI based on a dataset collected from 2007 to 2020 of An Kim Hai system, one of the most important irrigation systems in the north of Vietnam. The obtained results show that the application of selection methods allows reducing significantly the number of water quality parameters fed the ML models without losing their accuracy. In particular, by using the embedded method, we find out four important parameters, including Coliform, DO, Turbidity, and TSS, that have the greatest impact on water quality. Based on these parameters, the Random Forest model provides the best accuracy in predicting the WQI values from the An Kim Hai system with a Similarity of 0.94. The combination of feature selection and ML methods is then considered an effective alternative for calculating the WQI, leading to a desirable performance and a reduction of input parameters. This makes water quality monitoring less costly, substantial effort, and time. 相似文献
8.
《仿生工程学报(英文版)》2024,21(1)
Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA's contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing. 相似文献
9.
Breast cancer is one of the most prevalent types of cancers in females, which has become rampant all over the world in recent years. The survival rate of breast cancer patients degrades considerably for patients diagnosed at an advanced stage compared to those diagnosed at an early stage. The objective of this study is two folds. The first one is to find the most relevant biomarkers of breast cancer, which can be attained from regular blood analysis and anthropometric measurements. The other one is to improve the performance of current computer-aided diagnosis (CAD) system of early breast cancer detection. This study utilized a recent data set containing nine anthropometric and clinical attributes. In our methodology, first, we performed multicollinearity analysis and ranked the features based on the weighted average score obtained from four filter-based feature evaluation methods such as F-score, information gain, chi-square statistic, and Minimum Redundancy Maximum Relevance. Next, to improve the separability of the target classes, we scaled and weighted the dataset using min-max normalization and similarity-based attribute weighting by the k-means clustering algorithm, respectively. Finally, we trained standard machine learning (ML) models and evaluated the performance metrics by 10-fold cross-validation method. Our support vector machine (SVM) model with radial basis function (RBF) kernel appeared to be the most successful classifier by utilizing six features, namely, Body Mass Index (BMI), Age, Glucose, MCP-1, Resistin, and Insulin. The obtained classification accuracy, sensitivity, and specificity are 93.9% (95% CI: 93.2–94.6%), 95.1% (95% CI: 94.4–95.8%), and 94.0% (95% CI: 93.3–94.7%), respectively; these performance metrics outperformed state-of-the-art methods reported in the literature. The developed model could potentially assist the medical experts for the early diagnosis of breast cancer by employing a set of attributes that can be easily obtained from regular blood analysis and anthropometric measurements. 相似文献
10.
11.
Absolute protein concentration determination is becoming increasingly important in a number of fields including diagnostics, biomarker discovery and systems biology modeling. The recently introduced quantification concatamer methodology provides a novel approach to performing such determinations, and it has been applied to both microbial and mammalian systems. While a number of software tools exist for performing analyses of quantitative data generated by related methodologies such as SILAC, there is currently no analysis package dedicated to the quantification concatamer approach. Furthermore, most tools that are currently available in the field of quantitative proteomics do not manage storage and dissemination of such data sets. 相似文献
12.
Two-dimensional gel electrophoresis (2-DE) is the most established protein separation method used in expression proteomics. Despite the existence of sophisticated software tools, 2-DE gel image analysis still remains a serious bottleneck. The low accuracies of commercial software packages and the extensive manual calibration that they often require for acceptable results show that we are far from achieving the goal of a fully automated and reliable, high-throughput gel processing system. We present a novel spot detection and quantification methodology which draws heavily from unsupervised machine-learning methods. Using the proposed hierarchical machine learning-based segmentation methodology reduces both the number of faint spots missed (improves sensitivity) and the number of extraneous spots introduced (improves precision). The detection and quantification performance has been thoroughly evaluated and is shown to compare favorably (higher F-measure) to a commercially available software package (PDQuest). The whole image analysis pipeline that we have developed is fully automated and can be used for high-throughput proteomics analysis since it does not require any manual intervention for recalibration every time a new 2-DE gel image is to be analyzed. Furthermore, it can be easily parallelized for high performance and also applied without any modification to prealigned group average gels. 相似文献
13.
14.
《IRBM》2022,43(5):470-478
Background and objectiveHeart murmur characterization is a crucial part of cardiac auscultation for determining the potential etiology and severity of heart diseases. One such helpful murmur characterization is the sonic qualities, which reflect both structural and hemodynamical states of the heart. Therefore, the objective is to develop a machine learning based solution for classifying murmur qualities.MethodsFour medically defined murmur qualities, namely the musical quality, blowing-like quality, coarse quality, and soft quality were examined. Feature was extracted from heart murmurs signals in their time domain, frequency domain, time-frequency domain, and phase space domain. Sequential forward floating selection (SFFS) was implemented along with three classifiers, including k-nearest neighbor (KNN), Naïve-Bayes (NB), and linear support vector machine (SVM).ResultsIt was found that multi-domain features are suited for better classification results and linear SVM was able to achieve a better balance between performance and the size of feature subsets among tested classifiers. Using the derived features, classification accuracies of 86%, 91%, 90%, and 84% were achieved for musical quality, blowing-like quality, coarse quality, and soft quality classifications respectively.ConclusionsThe study demonstrated that it is possible to effectively characterize heart murmur through its diagnostic characteristics instead of drawing direct conclusions, which is helpful for retaining versatility and generality found in the conventional cardiac auscultation. 相似文献
15.
Pengyi Yang Ellis Patrick Sean J. Humphrey Shila Ghazanfar David E. James Raja Jothi Jean Yee Hwa Yang 《Proteomics》2016,16(13):1868-1871
Mass spectrometry (MS)‐based quantitative phosphoproteomics has become a key approach for proteome‐wide profiling of phosphorylation in tissues and cells. Traditional experimental design often compares a single treatment with a control, whereas increasingly more experiments are designed to compare multiple treatments with respect to a control. To this end, the development of bioinformatic tools that can integrate multiple treatments and visualise kinases and substrates under combinatorial perturbations is vital for dissecting concordant and/or independent effects of each treatment. Here, we propose a hypothesis driven kinase perturbation analysis (KinasePA) to annotate and visualise kinases and their substrates that are perturbed by various combinatorial effects of treatments in phosphoproteomics experiments. We demonstrate the utility of KinasePA through its application to two large‐scale phosphoproteomics datasets and show its effectiveness in dissecting kinases and substrates within signalling pathways driven by unique combinations of cellular stimuli and inhibitors. We implemented and incorporated KinasePA as part of the “directPA” R package available from the comprehensive R archive network (CRAN). Furthermore, KinasePA also has an interactive web interface that can be readily applied to annotate user provided phosphoproteomics data ( http://kinasepa.pengyiyang.org ). 相似文献
16.
Traditional laboratory experiments, rehabilitation clinics, and wearable sensors offer biomechanists a wealth of data on healthy and pathological movement. To harness the power of these data and make research more efficient, modern machine learning techniques are starting to complement traditional statistical tools. This survey summarizes the current usage of machine learning methods in human movement biomechanics and highlights best practices that will enable critical evaluation of the literature. We carried out a PubMed/Medline database search for original research articles that used machine learning to study movement biomechanics in patients with musculoskeletal and neuromuscular diseases. Most studies that met our inclusion criteria focused on classifying pathological movement, predicting risk of developing a disease, estimating the effect of an intervention, or automatically recognizing activities to facilitate out-of-clinic patient monitoring. We found that research studies build and evaluate models inconsistently, which motivated our discussion of best practices. We provide recommendations for training and evaluating machine learning models and discuss the potential of several underutilized approaches, such as deep learning, to generate new knowledge about human movement. We believe that cross-training biomechanists in data science and a cultural shift toward sharing of data and tools are essential to maximize the impact of biomechanics research. 相似文献
17.
King GJ 《Seminars in cell & developmental biology》2004,15(6):721-731
Bioinformatics is an integral aspect of plant and crop science research. Developments in data management and analytical software are reviewed with an emphasis on applications in functional genomics. This includes information resources for Arabidopsis and crop species, and tools available for analysis and visualisation of comparative genomic data. Approaches used to explore relationships between plant genes and expressed sequences are compared, including use of ontologies. The impact of bioinformatics in forward and reverse genetics is described, together with the potential from data mining. The role of bioinformatics is explored in the wider context of plant and crop science. 相似文献
18.
《IRBM》2022,43(5):434-446
ObjectiveThe initial principal task of a Brain-Computer Interfacing (BCI) research is to extract the best feature set from a raw EEG (Electroencephalogram) signal so that it can be used for the classification of two or multiple different events. The main goal of the paper is to develop a comparative analysis among different feature extraction techniques and classification algorithms.Materials and methodsIn this present investigation, four different methodologies have been adopted to classify the recorded MI (motor imagery) EEG signal, and their comparative study has been reported. Haar Wavelet Energy (HWE), Band Power, Cross-correlation, and Spectral Entropy (SE) based Cross-correlation feature extraction techniques have been considered to obtain the necessary features set from the raw EEG signals. Four different machine learning algorithms, viz. LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis), Naïve Bayes, and Decision Tree, have been used to classify the features.ResultsThe best average classification accuracies are 92.50%, 93.12%, 72.26%, and 98.71% using the four methods. Further, these results have been compared with some recent existing methods.ConclusionThe comparative results indicate a significant accuracy level performance improvement of the proposed methods with respect to the existing one. Hence, this presented work can guide to select the best feature extraction method and the classifier algorithm for MI-based EEG signals. 相似文献
19.
针对DNA序列编码区的识别问题,本研究提出一个特征向量和逻辑回归的组合模型。首先对DNA序列进行数值处理转化为特征向量,并结合k字符相对频率技术提取特征向量的元素特征,之后利用二分类逻辑回归算法,对编码区和非编码区进行准确区分。选取了HMR195和BG570两个基准数据集进行五折交叉验证,结果表明,平均AUC(Area Under Curve)值分别为0.981 3和0.987 4,明显优于传统的贝叶斯判别法和VOSSDFT等方法。此外,本文提出的特征向量的维度很低,提高了运算效率。因此,本文组合模型能够较为高效准确地识别蛋白质编码区。 相似文献
20.