共查询到20条相似文献,搜索用时 15 毫秒
1.
In recent years, developing the idea of “cancer big data” has emerged as a result of the significant expansion of various fields such as clinical research, genomics, proteomics and public health records. Advances in omics technologies are making a significant contribution to cancer big data in biomedicine and disease diagnosis. The increasingly availability of extensive cancer big data has set the stage for the development of multimodal artificial intelligence (AI) frameworks. These frameworks aim to analyze high-dimensional multi-omics data, extracting meaningful information that is challenging to obtain manually. Although interpretability and data quality remain critical challenges, these methods hold great promise for advancing our understanding of cancer biology and improving patient care and clinical outcomes. Here, we provide an overview of cancer big data and explore the applications of both traditional machine learning and deep learning approaches in cancer genomic and proteomic studies. We briefly discuss the challenges and potential of AI techniques in the integrated analysis of omics data, as well as the future direction of personalized treatment options in cancer. 相似文献
2.
3.
J. F. R. Robertson D. Pearson M. R. Price C. Selby J. Pearson R. W. Blamey A. Howell 《Cancer immunology, immunotherapy : CII》1991,33(6):403-410
Summary Retrospective analysis previously identified significant elevation of five tumour markers, carcinoembryonic antigen (CEA), ferritin, orosomucoid,C-reactive protein and erythrocyte sedimentation rate (ESR), in patients with systemic breast cancer and showed that changes in each of these markers individually correlated significantly with therapeutic response. In this study we have prospectively tested these findings. None of the five markers was significantly elevated in primary breast cancer compared to normal control or benign breast disease groups. They therefore appear to have no role either in screening or in the differential diagnosis of breast cancer. There was a significant elevation of all five markers in patients with systemic breast cancer (P <0.0001; analysis of variance) but sequential changes in CEA and ESR only correlated significantly with the UICC-assessed response. Prospective confirmation of the correlation between changes in serum CEA and ESR provides the basis for using these markers in the assessment of response to therapy in patients with systemic breast cancer. 相似文献
4.
The use of antigenicity scales based on physicochemical properties and the sliding window method in combination with an averaging algorithm and subsequent search for the maximum value is the classical method for B-cell epitope prediction. However, recent studies have demonstrated that the best classical methods provide a poor correlation with experimental data. We review both classical and novel algorithms and present our own implementation of the algorithms. The AAPPred software is available at http://www.bioinf.ru/aappred/. 相似文献
5.
microRNA(miRNA)是一类不编码蛋白的调控小分子RNA,在真核生物中发挥着广泛而重要的调控功能.由于miRNA的表达具有时空特异性,因而通过计算方法预测miRNA而后有针对性的实验验证是miRNA发现的一条重要途径.降低假阳性率是miRNA预测方法面临的重要挑战.本研究采用集成学习方法构建预测miRNA前体的分类器SVMbagging,对训练集、测试集和独立测试集的结果表明,本研究的方法性能稳健、假阳性率低,具有很好的泛化能力,尤其是当阈值取0.9时,特异性高达99.90%,敏感性在26%以上,适合于全基因组预测.采用SVMbagging在人全基因组中预测miRNA前体,当取阈值0.9时,得到14933个可能的miRNA前体.通过与高通量小RNA测序数据的比较,发现其中4481个miRNA前体具有完全匹配的小RNA序列,与理论估计的真阳性数值非常接近.最后,对32个可能的miRNA进行实验验证,确定其中2条为真实的miRNA. 相似文献
6.
WESLEY M. HOCHACHKA RICH CARUANA DANIEL FINK ART MUNSON MIREK RIEDEWALD DARIA SOROKINA STEVE KELLING 《The Journal of wildlife management》2007,71(7):2427-2437
ABSTRACT Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research. 相似文献
7.
有关蛋白质功能的研究是解析生命奥秘的基础,机器学习技术在该领域已有广泛应用。利用支持向量机(support vectormachine,SVM)方法,构建一个预测蛋白质功能位点的通用平台。该平台先提取非同源蛋白质序列,再对这些序列进行特征编码(包括序列的基本信息、物化特征、结构信息及序列保守性特征等),以编码好的样本作为训练数据,利用SVM进行训练,得到敏感性、特异性、Matthew相关系数、准确率及ROC曲线等评价指标,反复测试,得到评价指标最优的SVM模型后,便可以用来预测蛋白质序列上的功能位点。该平台除了应用在预测蛋白质功能位点之外,还可以应用于疾病相关单核苷酸多态性(SNP)预测分析、预测蛋白质结构域分析、生物分子间的相互作用等。 相似文献
8.
9.
James Longden Xavier Robin Mathias Engel Jesper Ferkinghoff-Borg Ida Kjær Ivan D. Horak Mikkel W. Pedersen Rune Linding 《Cell reports》2021,34(3):108657
- Download : Download high-res image (144KB)
- Download : Download full-size image
10.
Andrew W. Senior Richard Evans John Jumper James Kirkpatrick Laurent Sifre Tim Green Chongli Qin Augustin Žídek Alexander W. R. Nelson Alex Bridgland Hugo Penedones Stig Petersen Karen Simonyan Steve Crossan Pushmeet Kohli David T. Jones David Silver Koray Kavukcuoglu Demis Hassabis 《Proteins》2019,87(12):1141-1148
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods. 相似文献
11.
药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。 相似文献
12.
James R. Bradford Chris J. Needham Philip Tedder Matthew A. Care Andrew J. Bulpitt David R. Westhead 《The Plant journal : for cell and molecular biology》2010,61(4):713-721
Despite recent advances, accurate gene function prediction remains an elusive goal, with very few methods directly applicable to the plant Arabidopsis thaliana. In this study, we present GO‐At (gene ontology prediction in A. thaliana), a method that combines five data types (co‐expression, sequence, phylogenetic profile, interaction and gene neighbourhood) to predict gene function in Arabidopsis. Using a simple, yet powerful two‐step approach, GO‐At first generates a list of genes ranked in descending order of probability of functional association with the query gene. Next, a prediction score is automatically assigned to each function in this list based on the assumption that functions appearing most frequently at the top of the list are most likely to represent the function of the query gene. In this way, the second step provides an effective alternative to simply taking the ‘best hit’ from the first list, and achieves success rates of up to 79%. GO‐At is applicable across all three GO categories: molecular function, biological process and cellular component, and can assign functions at multiple levels of annotation detail. Furthermore, we demonstrate GO‐At’s ability to predict functions of uncharacterized genes by identifying ten putative golgins/Golgi‐associated proteins amongst 8219 genes of previously unknown cellular component and present independent evidence to support our predictions. A web‐based implementation of GO‐At ( http://www.bioinformatics.leeds.ac.uk/goat ) is available, providing a unique resource for plant researchers to make predictions for uncharacterized genes and predict novel functions in Arabidopsis. 相似文献
13.
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed. 相似文献
14.
NetCGlyc 1.0: prediction of mammalian C-mannosylation sites 总被引:2,自引:0,他引:2
Julenius K 《Glycobiology》2007,17(8):868-876
15.
Biological and functional analysis of statistically significant pathways deregulated in colon cancer by using gene expression profiles
下载免费PDF全文

Distaso A Abatangelo L Maglietta R Creanza TM Piepoli A Carella M D'Addabbo A Ancona N 《International journal of biological sciences》2008,4(6):368-378
Gene expression profiling offers a great opportunity for studying multi-factor diseases and for understanding the key role of genes in mechanisms which drive a normal cell to a cancer state. Single gene analysis is insufficient to describe the complex perturbations responsible for cancer onset, progression and invasion. A deeper understanding of the mechanisms of tumorigenesis can be reached focusing on deregulation of gene sets or pathways rather than on individual genes. We apply two known and statistically well founded methods for finding pathways and biological processes deregulated in pathological conditions by analyzing gene expression profiles. In particular, we measure the amount of deregulation and assess the statistical significance of predefined pathways belonging to a curated collection (Molecular Signature Database) in a colon cancer data set. We find that pathways strongly involved in different tumors are strictly connected with colon cancer. Moreover, our experimental results show that the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. Our study shows the importance of using gene sets rather than single genes for understanding the main biological processes and pathways involved in colorectal cancer. Our analysis evidences that many of the genes involved in these pathways are strongly associated to colorectal tumorigenesis. In this new perspective, the focus shifts from finding differentially expressed genes to identifying biological processes, cellular functions and pathways perturbed in the phenotypic conditions by analyzing genes co-expressed in a given pathway as a whole, taking into account the possible interactions among them and, more importantly, the correlation of their expression with the phenotypical conditions. 相似文献
16.
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in, for example, instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements. 相似文献
17.
We developed a method called residue contact frequency (RCF), which uses the complex structures generated by the protein–protein docking algorithm ZDOCK to predict interface residues. Unlike interface prediction algorithms that are based on monomers alone, RCF is binding partner specific. We evaluated the performance of RCF using the area under the precision‐recall (PR) curve (AUC) on a large protein docking Benchmark. RCF (AUC = 0.44) performed as well as meta‐PPISP (AUC = 0.43), which is one of the best monomer‐based interface prediction methods. In addition, we test a support vector machine (SVM) to combine RCF with meta‐PPISP and another monomer‐based interface prediction algorithm Evolutionary Trace to further improve the performance. We found that the SVM that combined RCF and meta‐PPISP achieved the best performance (AUC = 0.47). We used RCF to predict the binding interfaces of proteins that can bind to multiple partners and RCF was able to correctly predict interface residues that are unique for the respective binding partners. Furthermore, we found that residues that contributed greatly to binding affinity (hotspot residues) had significantly higher RCF than other residues. Proteins 2014; 82:57–66. © 2013 Wiley Periodicals, Inc. 相似文献
18.
19.
In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning-based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV . 相似文献