共查询到20条相似文献,搜索用时 15 毫秒
1.
Chunyang Li Xiaoxi Zeng Haopeng Yu Yonghong Gu Wei Zhang 《World journal of surgical oncology》2018,16(1):223
Background
Pancreatic cancer is one of the most lethal tumors with poor prognosis, and lacks of effective biomarkers in diagnosis and treatment. The aim of this investigation was to identify hub genes in pancreatic cancer, which would serve as potential biomarkers for cancer diagnosis and therapy in the future.Methods
Combination of two expression profiles of GSE16515 and GSE22780 from Gene Expression Omnibus (GEO) database was served as training set. Differentially expressed genes (DEGs) with top 25% variance followed by protein-protein interaction (PPI) network were performed to find candidate genes. Then, hub genes were further screened by survival and cox analyses in The Cancer Genome Atlas (TCGA) database. Finally, hub genes were validated in GSE15471 dataset from GEO by supervised learning methods k-nearest neighbor (kNN) and random forest algorithms.Results
After quality control and batch effect elimination of training set, 181 DEGs bearing top 25% variance were identified as candidate genes. Then, two hub genes, MMP7 and ITGA2, correlating with diagnosis and prognosis of pancreatic cancer were screened as hub genes according to above-mentioned bioinformatics methods. Finally, hub genes were demonstrated to successfully differ tumor samples from normal tissues with predictive accuracies reached to 93.59 and 81.31% by using kNN and random forest algorithms, respectively.Conclusions
All the hub genes were associated with the regulation of tumor microenvironment, which implicated in tumor proliferation, progression, migration, and metastasis. Our results provide a novel prospect for diagnosis and treatment of pancreatic cancer, which may have a further application in clinical.2.
Yan A Kloczkowski A Hofmann H Jernigan RL 《Journal of biomolecular structure & dynamics》2007,25(3):275-288
We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Omega(i) between the vector pointing from the center of the protein structure to the C(i)(alpha) atom and the vector pointing from the C(i)(alpha) atom to the center of its side chain atoms. To predict the Omega(i) angles, we construct statistical models by using several different methods such as general linear regression, a regression tree and bagging, a neural network, and a support vector machine. The root mean square errors for the different models range only from 36.67 to 37.60 degrees and the correlation coefficients are all between 30% and 34%. The performances of different models in the test set are, thus, quite similar, and show the relative predictive power of these models to be significant in comparison with random side chain orientations. 相似文献
3.
Komarova AV Combredet C Meyniel-Schicklin L Chapelle M Caignard G Camadro JM Lotteau V Vidalain PO Tangy F 《Molecular & cellular proteomics : MCP》2011,10(12):M110.007443
RNA viruses exhibit small-sized genomes encoding few proteins, but still establish complex networks of interactions with host cell components to achieve replication and spreading. Ideally, these virus-host protein interactions should be mapped directly in infected cell culture, but such a high standard is often difficult to reach when using conventional approaches. We thus developed a new strategy based on recombinant viruses expressing tagged viral proteins to capture both direct and indirect physical binding partners during infection. As a proof of concept, we engineered a recombinant measles virus (MV) expressing one of its virulence factors, the MV-V protein, with a One-STrEP amino-terminal tag. This allowed virus-host protein complex analysis directly from infected cells by combining modified tandem affinity chromatography and mass spectrometry analysis. Using this approach, we established a prosperous list of 245 cellular proteins interacting either directly or indirectly with MV-V, and including four of the nine already known partners of this viral factor. These interactions were highly specific of MV-V because they were not recovered when the nucleoprotein MV-N, instead of MV-V, was tagged. Besides key components of the antiviral response, cellular proteins from mitochondria, ribosomes, endoplasmic reticulum, protein phosphatase 2A, and histone deacetylase complex were identified for the first time as prominent targets of MV-V and the critical role of the later protein family in MV replication was addressed. Most interestingly, MV-V showed some preferential attachment to essential proteins in the human interactome network, as assessed by centrality and interconnectivity measures. Furthermore, the list of MV-V interactors also showed a massive enrichment for well-known targets of other viruses. Altogether, this clearly supports our approach based on reverse genetics of viruses combined with high-throughput proteomics to probe the interaction network that viruses establish in infected cells. 相似文献
4.
5.
YuChen Xiang Kai Ling C. Seow Carl Paterson Peter Török 《Journal of biophotonics》2021,14(7):e202000508
Brillouin imaging relies on the reliable extraction of subtle spectral information from hyperspectral datasets. To date, the mainstream practice has been to use line fitting of spectral features to retrieve the average peak shift and linewidth parameters. Good results, however, depend heavily on sufficient signal-to-noise ratio and may not be applicable in complex samples that consist of spectral mixtures. In this work, we thus propose the use of various multivariate algorithms that can be used to perform supervised or unsupervised analysis of the hyperspectral data, with which we explore advanced image analysis applications, namely unmixing, classification and segmentation in a phantom and live cells. The resulting images are shown to provide more contrast and detail, and obtained on a timescale ∼102 faster than fitting. The estimated spectral parameters are consistent with those calculated from pure fitting. 相似文献
6.
Bridges M Heron EA O'Dushlaine C Segurado R;International Schizophrenia Consortium 《PloS one》2011,6(5):e14802
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies. 相似文献
7.
Training data in a supervised learning problem consist of the class label and its potential predictors for a set of observations. Constructing effective classifiers from training data is the goal of supervised learning. In biomedical sciences and other scientific applications, class labels may be subject to errors. We consider a setting where there are two classes but observations with labels corresponding to one of the classes may in fact be mislabeled. The application concerns the use of protein mass-spectrometry data to discriminate between serum samples from cancer and noncancer patients. The patients in the training set are classified on the basis of tissue biopsy. Although biopsy is 100% specific in the sense that a tissue that shows itself to have malignant cells is certainly cancer, it is less than 100% sensitive. Reference gold standards that are subject to this special type of misclassification due to imperfect diagnosis certainty arise in many fields. We consider the development of a supervised learning algorithm under these conditions and refer to it as partially supervised learning. Boosting is a supervised learning algorithm geared toward high-dimensional predictor data, such as those generated in protein mass-spectrometry. We propose a modification of the boosting algorithm for partially supervised learning. The proposal is to view the true class membership of the samples that are labeled with the error-prone class label as missing data, and apply an algorithm related to the EM algorithm for minimization of a loss function. To assess the usefulness of the proposed method, we artificially mislabeled a subset of samples and applied the original and EM-modified boosting (EM-Boost) algorithms for comparison. Notable improvements in misclassification rates are observed with EM-Boost. 相似文献
8.
Many individuals tested for inherited cancer susceptibility at the BRCA1 gene locus are discovered to have variants of unknown clinical significance (UCVs). Most UCVs cause a single amino acid residue (missense) change in the BRCA1 protein. They can be biochemically assayed, but such evaluations are time-consuming and labor-intensive. Computational methods that classify and suggest explanations for UCV impact on protein function can complement functional tests. Here we describe a supervised learning approach to classification of BRCA1 UCVs. Using a novel combination of 16 predictive features, the algorithms were applied to retrospectively classify the impact of 36 BRCA1 C-terminal (BRCT) domain UCVs biochemically assayed to measure transactivation function and to blindly classify 54 documented UCVs. Majority vote of three supervised learning algorithms is in agreement with the assay for more than 94% of the UCVs. Two UCVs found deleterious by both the assay and the classifiers reveal a previously uncharacterized putative binding site. Clinicians may soon be able to use computational classifiers such as those described here to better inform patients. These classifiers can be adapted to other cancer susceptibility genes and systematically applied to prioritize the growing number of potential causative loci and variants found by large-scale disease association studies. 相似文献
9.
Background
Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (ΔΔG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. 相似文献10.
While the distribution patterns of cold-water corals, such as Paragorgia arborea, have received increasing attention in recent studies, little is known about their in situ activity patterns. In this paper, we examine polyp activity in P. arborea using machine learning techniques to analyze high-resolution time series data and photographs obtained from an autonomous lander cluster deployed in the Stjernsund, Norway. An interactive illustration of the models derived in this paper is provided online as supplementary material.We find that the best predictor of the degree of extension of the coral polyps is current direction with a lag of three hours. Other variables that are not directly associated with water currents, such as temperature and salinity, offer much less information concerning polyp activity. Interestingly, the degree of polyp extension can be predicted more reliably by sampling the laminar flows in the water column above the measurement site than by sampling the more turbulent flows in the direct vicinity of the corals.Our results show that the activity patterns of the P. arborea polyps are governed by the strong tidal current regime of the Stjernsund. It appears that P. arborea does not react to shorter changes in the ambient current regime but instead adjusts its behavior in accordance with the large-scale pattern of the tidal cycle itself in order to optimize nutrient uptake. 相似文献
11.
《Cell cycle (Georgetown, Tex.)》2013,12(9):1319-1323
A simple, efficient system has been developed to produce high titers of infectious human papillomavirus type 18 (HPV-18) in organotypic raft cultures of primary human keratinocytes (PHKs). Molecular characterization elucidated key early and late events in the reproductive program. The system obviates the need for immortalized cells and allows the analyses of mutant HPV genomes not previously possible. An E6 deletion mutant incapable of causing p53 degradation is defective in viral DNA amplification and capsid protein production. The high levels of p53 protein which accumulated in numerous cells did not lead to apoptosis over a prolonged duration. Time course and metabolic labeling experiments revealed novel interactions with the host. Notably, post-mitotic, differentiated cells are induced by HPV E7 expression to reenter S phase, whereupon host chromosomes replicate, but HPV DNA does not amplify until the cells have progressed to and are arrested in G2 phase. Here, we present data that strongly suggest that the abundant cytoplasmic viral E1^E4 protein is not responsible for this G2 arrest, as described in the literature upon ectopic expression in cell lines. We provide additional insights into the viral life cycle and contrast them to conclusions derived from experiments in cell lines. 相似文献
12.
13.
The spread of drug resistance through malaria parasite populations calls for the development of new therapeutic strategies.
However, the seemingly promising genomics-driven target identification paradigm is hampered by the weak annotation coverage.
To identify potentially important yet uncharacterized proteins, we apply support vector machines using profile kernels, a
supervised discriminative machine learning technique for remote homology detection, as a complement to the traditional alignment
based algorithms. In this study, we focus on the prediction of proteases, which have long been considered attractive drug
targets because of their indispensable roles in parasite development and infection. Our analysis demonstrates that an abundant
and complex repertoire is conserved in five Plasmodium parasite species. Several putative proteases may be important components in networks that mediate cellular processes, including
hemoglobin digestion, invasion, trafficking, cell cycle fate, and signal transduction. This catalog of proteases provides
a short list of targets for functional characterization and rational inhibitor design.
Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.
Rui Kuang and Jianying Gu have contributed equally to this work.
An erratum to this article can be found at 相似文献
14.
15.
Computational prediction of protein complex structures through docking offers a means to gain a mechanistic understanding of protein interactions that mediate biological processes. This is particularly important as the number of experimentally determined structures of isolated proteins exceeds the number of structures of complexes. A comprehensive docking procedure is described in which efficient sampling of conformations is achieved by matching surface normal vectors, fast filtering for shape complementarity, clustering by RMSD, and scoring the docked conformations using a supervised machine learning approach. Contacting residue pair frequencies, residue propensities, evolutionary conservation, and shape complementarity score for each docking conformation are used as input data to a Random Forest classifier. The performance of the Random Forest approach for selecting correctly docked conformations was assessed by cross-validation using a nonredundant benchmark set of X-ray structures for 93 heterodimer and 733 homodimer complexes. The single highest rank docking solution was the correct (near-native) structure for slightly more than one third of the complexes. Furthermore, the fraction of highly ranked correct structures was significantly higher than the overall fraction of correct structures, for almost all complexes. A detailed analysis of the difficult to predict complexes revealed that the majority of the homodimer cases were explained by incorrect oligomeric state annotation. Evolutionary conservation and shape complementarity score as well as both underrepresented and overrepresented residue types and residue pairs were found to make the largest contributions to the overall prediction accuracy. Finally, the method was also applied to docking unbound subunit structures from a previously published benchmark set. 相似文献
16.
Recently, developments have been made in predicting the structure of docked complexes when the coordinates of the components are known. The process generally consists of a stage during which the components are combined rigidly and then a refinement stage. Several rapid new algorithms have been introduced in the rigid docking problem and promising refinement techniques have been developed, based on modified molecular mechanics force fields and empirical measures of desolvation, combined with minimisations that switch on the short-range interactions gradually. There has also been progress in developing a benchmark set of targets for docking and a blind trial, similar to the trials of protein structure prediction, has taken place. 相似文献
17.
Rowland JJ 《Bio Systems》2003,72(1-2):187-196
The expressive power, powerful search capability, and the explicit nature of the resulting models make evolutionary methods very attractive for supervised learning applications in bioinformatics. However, their characteristics also make them highly susceptible to overtraining or to discovering chance relationships in the data. Identification of appropriate criteria for terminating evolution and for selecting an appropriately validated model is vital. Some approaches that are commonly applied to other modelling methods are not necessarily applicable in a straightforward manner to evolutionary methods. An approach to model selection is presented that is not unduly computationally intensive. To illustrate the issues and the technique two bioinformatic datasets are used, one relating to metabolite determination and the other to disease prediction from gene expression data. 相似文献
18.
Most algorithms currently used to model synaptic plasticity in self-organizing cortical networks suppose that the change in synaptic efficacy is governed by the same structuring factor, i.e., the temporal correlation of activity between pre- and postsynaptic neurons. Functional predictions generated by such algorithms have been tested electrophysiologically in the visual cortex of anesthetized and paralyzed cats. Supervised learning procedures were applied at the cellular level to change receptive field (RF) properties during the time of recording of an individual functionally identified cell. The protocols were devised as cellular analogs of the plasticity of RF properties, which is normally expressed during a critical period of postnatal development. We summarize here evidence demonstrating that changes in covariance between afferent input and postsynaptic response imposed during extracellular and intracellular conditioning can acutely induce selective long-lasting up- and down-regulations of visual responses. The functional properties that could be modified in 40% of cells submitted to differential pairing protocols include ocular dominance, orientation selectivity and orientation preference, interocular orientation disparity, and the relative dominance of ON and OFF responses. Since changes in RF properties can be induced in the adult as well, our findings also suggest that similar activity-dependent processes may occur during development and during active phases of learning under the supervision of behavioral attention or contextual signals. Such potential for plasticity in primary visual cortical neurons suggests the existence of a hidden connectivity expressing a wider functional competence than the one revealed at the spiking level. In particular, in the spatial domain the sensory synaptic integration field is larger than the classical discharge field. It can be shaped by supervised learning and its subthreshold extent can be unmasked by the pharmacological blockade of intracortical inhibition. 相似文献
19.
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. 总被引:84,自引:0,他引:84
Margaret A Shipp Ken N Ross Pablo Tamayo Andrew P Weng Jeffery L Kutok Ricardo C T Aguiar Michelle Gaasenbeek Michael Angelo Michael Reich Geraldine S Pinkus Tane S Ray Margaret A Koval Kim W Last Andrew Norton T Andrew Lister Jill Mesirov Donna S Neuberg Eric S Lander Jon C Aster Todd R Golub 《Nature medicine》2002,8(1):68-74
Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of clinical heterogeneity, nor specific therapeutic targets. We analyzed the expression of 6,817 genes in diagnostic tumor specimens from DLBCL patients who received cyclophosphamide, adriamycin, vincristine and prednisone (CHOP)-based chemotherapy, and applied a supervised learning prediction method to identify cured versus fatal or refractory disease. The algorithm classified two categories of patients with very different five-year overall survival rates (70% versus 12%). The model also effectively delineated patients within specific IPI risk categories who were likely to be cured or to die of their disease. Genes implicated in DLBCL outcome included some that regulate responses to B-cell-receptor signaling, critical serine/threonine phosphorylation pathways and apoptosis. Our data indicate that supervised learning classification techniques can predict outcome in DLBCL and identify rational targets for intervention. 相似文献
20.
《International journal of bio-medical computing》1990,25(2-3):151-167
A neural network processing scheme is proposed which utilizes a self-organizing Kohonen feature map as the front end to a feedforward classifier network. The results of a series of benchmarking studies based upon artificial statistical pattern recognition tasks indicate that the proposed architecture performs significantly better than conventional feedforward classifier networks when the decision regions are disjoint. This is attributed to the fact that the self-organization process allows internal units in the succeeding classifier network to be sensitive to a specific set of features in the input space at the outset of training. 相似文献