首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
《IRBM》2022,43(6):678-686
ObjectivesFeature selection in data sets is an important task allowing to alleviate various machine learning and data mining issues. The main objectives of a feature selection method consist on building simpler and more understandable classifier models in order to improve the data mining and processing performances. Therefore, a comparative evaluation of the Chi-square method, recursive feature elimination method, and tree-based method (using Random Forest) used on the three common machine learning methods (K-Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed to select the most relevant primitives from a large set of attributes. Furthermore, determining the most suitable couple (i.e., feature selection method-machine learning method) that provides the best performance is performed.Materials and methodsIn this paper, an overview of the most common feature selection techniques is first provided: the Chi-Square method, the Recursive Feature Elimination method (RFE) and the tree-based method (using Random Forest). A comparative evaluation of the improvement (brought by such feature selection methods) to the three common machine learning methods (K- Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed. For evaluation purposes, the following measures: micro-F1, accuracy and root mean square error are used on the stroke disease data set.ResultsThe obtained results show that the proposed approach (i.e., Tree Based Method using Random Forest, TBM-RF, decision tree classifier, DTC) provides accuracy higher than 85%, F1-score higher than 88%, thus, better than the KNN and NB using the Chi-Square, RFE and TBM-RF methods.ConclusionThis study shows that the couple - Tree Based Method using Random Forest (TBM-RF) decision tree classifier successfully and efficiently contributes to find the most relevant features and to predict and classify patient suffering of stroke disease.”  相似文献   

2.
The availability of complete pathogen genomes has renewed interest in the development of diagnostics for infectious diseases. Synthetic peptide microarrays provide a rapid, high-throughput platform for immunological testing of potential B-cell epitopes. However, their current capacity prevent the experimental screening of complete “peptidomes”. Therefore, computational approaches for prediction and/or prioritization of diagnostically relevant peptides are required. In this work we describe a computational method to assess a defined set of molecular properties for each potential diagnostic target in a reference genome. Properties such as sub-cellular localization or expression level were evaluated for the whole protein. At a higher resolution (short peptides), we assessed a set of local properties, such as repetitive motifs, disorder (structured vs natively unstructured regions), trans-membrane spans, genetic polymorphisms (conserved vs. divergent regions), predicted B-cell epitopes, and sequence similarity against human proteins and other potential cross-reacting species (e.g. other pathogens endemic in overlapping geographical locations). A scoring function based on these different features was developed, and used to rank all peptides from a large eukaryotic pathogen proteome. We applied this method to the identification of candidate diagnostic peptides in the protozoan Trypanosoma cruzi, the causative agent of Chagas disease. We measured the performance of the method by analyzing the enrichment of validated antigens in the high-scoring top of the ranking. Based on this measure, our integrative method outperformed alternative prioritizations based on individual properties (such as B-cell epitope predictors alone). Using this method we ranked 10 million 12-mer overlapping peptides derived from the complete T. cruzi proteome. Experimental screening of 190 high-scoring peptides allowed the identification of 37 novel epitopes with diagnostic potential, while none of the low scoring peptides showed significant reactivity. Many of the metrics employed are dependent on standard bioinformatic tools and data, so the method can be easily extended to other pathogen genomes.  相似文献   

3.

Introduction

In this article, we report 7 novel KRAS gene mutations discovered while retrospectively studying the prevalence and pattern of KRAS mutations in cancerous tissue obtained from 56 Saudi sporadic colorectal cancer patients from the Eastern Province.

Methods

Genomic DNA was extracted from formalin-fixed, paraffin-embedded cancerous and noncancerous colorectal tissues. Successful and specific PCR products were then bi-directionally sequenced to detect exon 4 mutations while Mutector II Detection Kits were used for identifying mutations in codons 12, 13 and 61. The functional impact of the novel mutations was assessed using bioinformatics tools and molecular modeling.

Results

KRAS gene mutations were detected in the cancer tissue of 24 cases (42.85%). Of these, 11 had exon 4 mutations (19.64%). They harbored 8 different mutations all of which except two altered the KRAS protein amino acid sequence and all except one were novel as revealed by COSMIC database. The detected novel mutations were found to be somatic. One mutation is predicted to be benign. The remaining mutations are predicted to cause substantial changes in the protein structure. Of these, the Q150X nonsense mutation is the second truncating mutation to be reported in colorectal cancer in the literature.

Conclusions

Our discovery of novel exon 4 KRAS mutations that are, so far, unique to Saudi colorectal cancer patients may be attributed to environmental factors and/or racial/ethnic variations due to genetic differences. Alternatively, it may be related to paucity of clinical studies on mutations other than those in codons 12, 13, 61 and 146. Further KRAS testing on a large number of patients of various ethnicities, particularly beyond the most common hotspot alleles in exons 2 and 3 is needed to assess the prevalence and explore the exact prognostic and predictive significance of the discovered novel mutations as well as their possible role in colorectal carcinogenesis.  相似文献   

4.
Abstract

Oncogenic mutations in expressed proteins are of primary interest to understand tumor formation but their structural consequences bearing on protein function are not clearly understood. In this contribution I report on two illustrative examples, p21ras and p57, revealing that such mutations have an effect on specific structural deficiencies in the packing of the protein structure, i. e., on backbone hydrogen bonds insufficiently shielded from water attack. These structural deficiencies in the wild type are typically “corrected intermolecu- larly” by protein complexation or protein-ligand association. However, in the oncogenic mutants, these binding signals are partially or completely suppressed: the mutated residues properly wrap or desolvate the hydrogen bonds intramolecularly. Thus, the interactivity of the proteins becomes impaired: their binding affinity decreases sharply, as there is no thermodynamic benefit from removing water surrounding properly desolvated hydrogen bonds. The results, specialized for p21ras and p53, reveal how oncogenic mutations determine a hindrance to GAP-induced hydrolysis (p21) and decrease binding affinity for DNA (p53). Furthermore, the oncogenic potential of mutations in residues not directly engaged in the interface electrostatics is assessed. The results suggest that a high sensitivity of structural defects to genetic accident might be a necessary condition to establish the existence of a proto-oncogene, an angle that merits a systematic study.  相似文献   

5.
In the Balkan and Taiwan, the relationship between exposure to aristolochic acid and risk of urothelial neoplasms was inferred from the A>T genetic hallmark in TP53 gene from malignant cells. This study aimed to characterize the TP53 mutational spectrum in urothelial cancers consecutive to Aristolochic Acid Nephropathy in Belgium. Serial frozen tumor sections from female patients (n = 5) exposed to aristolochic acid during weight-loss regimen were alternatively used either for p53 immunostaining or laser microdissection. Tissue areas with at least 60% p53-positive nuclei were selected for microdissecting sections according to p53-positive matching areas. All areas appeared to be carcinoma in situ. After DNA extraction, mutations in the TP53 hot spot region (exons 5–8) were identified using nested-PCR and sequencing. False-negative controls consisted in microdissecting fresh-frozen tumor tissues both from a patient with a Li-Fraumeni syndrome who carried a p53 constitutional mutation, and from KRas mutated adenocarcinomas. To rule out false-positive results potentially generated by microdissection and nested-PCR, a phenacetin-associated urothelial carcinoma and normal fresh ureteral tissues (n = 4) were processed with high laser power. No unexpected results being identified, molecular analysis was pursued on malignant tissues, showing at least one mutation in all (six different mutations in two) patients, with 13/16 exonic (nonsense, 2; missense, 11) and 3/16 intronic (one splice site) mutations. They were distributed as transitions (n = 7) or transversions (n = 9), with an equal prevalence of A>T and G>T (3/16 each). While current results are in line with A>T prevalence previously reported in Balkan and Taiwan studies, they also demonstrate that multiple mutations in the TP53 hot spot region and a high frequency of G>T transversion appear as a complementary signature reflecting the toxicity of a cumulative dose of aristolochic acid ingested over a short period of time.  相似文献   

6.
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.  相似文献   

7.
Phosphatidylinositol 3-kinases (PI3Ks) are important regulators of signaling pathways. To determine whether PI3Ks are genetically altered in human cancers, we recently analyzed the sequences of the PI3K gene family and discovered that one member, the PIK3CA gene encoding the p110? catalytic subunit, was frequently mutated in cancers of the colon, breast, brain and lung. The majority of mutations clustered near two positions within the PI3K helical or catalytic domains and at least one hotspot mutation appeared to increase kinase activity. PIK3CA represents one of the most highly mutated oncogenes identified in human cancers and may be a useful diagnostic and therapeutic target.  相似文献   

8.

Background

Multiplex detection of low-level mutant alleles in the presence of wild-type DNA would be useful for several fields of medicine including cancer, pre-natal diagnosis and infectious diseases. COLD-PCR is a recently developed method that enriches low-level mutations during PCR cycling, thus enhancing downstream detection without the need for special reagents or equipment. The approach relies on the differential denaturation of DNA strands which contain Tm-lowering mutations or mismatches, versus ‘homo-duplex’ wild-type DNA. Enabling multiplex-COLD-PCR that can enrich mutations in several amplicons simultaneously is desirable but technically difficult to accomplish. Here we describe the proof of principle of an emulsion-PCR based approach that demonstrates the feasibility of multiplexed-COLD-PCR within a single tube, using commercially available mutated cell lines. This method works best with short amplicons; therefore, it could potentially be used on highly fragmented samples obtained from biological material or FFPE specimens.

Methods

Following a multiplex pre-amplification of TP53 exons from genomic DNA, emulsions which incorporate the multiplex product, PCR reagents and primers specific for a given TP53 exon are prepared. Emulsions with different TP53 targets are then combined in a single tube and a fast-COLD-PCR program that gradually ramps up the denaturation temperature over several PCR cycles is applied (temperature-tolerant, TT-fast-eCOLD-PCR). The range of denaturation temperatures applied encompasses the critical denaturation temperature (Tc) corresponding to all the amplicons included in the reaction, resulting to a gradual enrichment of mutations within all amplicons encompassed by emulsion.

Results

Validation for TT-fast-eCOLD-PCR is provided for TP53 exons 6–9. Using dilutions of mutated cell-line into wild-type DNA, we demonstrate simultaneous mutation enrichment between 7 to 15-fold in all amplicons examined.

Conclusions

TT-fast-eCOLD-PCR expands the versatility of COLD-PCR and enables high-throughput enrichment of low-level mutant alleles over multiple sequences in a single tube.  相似文献   

9.
Bruck syndrome (BS) is an extremely rare form of osteogenesis imperfecta characterized by congenital joint contracture, multiple fractures and short stature. We described the phenotypes of BS in two Chinese patients for the first time. The novel compound heterozygous mutations c.764_772dupACGTCCTCC (p.255_257dupHisValLeu) in exon 5 and c.1405G>T (p.Gly469X) in exon 9 of FKBP10 were identified in one proband. The novel compound heterozygous mutations c.1624delT (p.Tyr542Thrfs*18) in exon 14 and c.1880T>C (p.Val627Ala) in exon 17 of PLOD2 were identified in another probrand. Intravenous zoledronate was a potent agent for these patients, confirmed the efficacy of bisphosphonates on this disease. In conclusion, the novel causative mutations identified in the patients expand the genotypic spectrum of BS.  相似文献   

10.
We determined frequency/types of K-ras mutations in colorectal/lung cancer. ADx-K-ras kit (real-time/double-loop probe PCR) was used to detect somatic tumor gene mutations compared with Sanger DNA sequencing using 583 colorectal and 244 lung cancer paraffin-embedded clinical samples. Genomic DNA was used in both methods; mutation rates at codons 12/13 and frequency of each mutation were detected and compared. The data show that 91.4% colorectal and 59.0% lung carcinoma samples were detected conclusively by DNA sequencing, whereas 100% colorectal and lung samples were detected by ADx-K-ras kit. K-ras gene mutations were detected in 32.9–27.4% colorectal samples using kit and sequencing methods, respectively. Whereas 10.6–8.3% lung cancer samples were positively detected by kit and sequencing methods, respectively. Notably, 172/677 showed mutations and 467/677 showed wild type by both methods; 38 samples showed mutations with kit but wild type with sequencing. Mutations in colorectal samples were as follows: GGT → GAT/codon-12 (35.1%); GGC → GAC/codon-13 (26.6%); GGT → GTT/codon-12 (18.2%); and GGT → GCT/codon-12 (1.6%). Mutations in lung samples were as follows: GGT > GTT/codon-12 (40.9%) and GGT > GCT/codon-12 (4.5%). In conclusion, K-ras mutations involved 32.2% colorectal and 10.6% lung samples among this cohort. ADx-K-ras real-time PCR showed higher detection rates (P < 0.05). The kit method has good clinical applicability as it is simple, fast, less prone to contamination and hence can be used effectively and reliably for clinical screening of somatic tumor gene mutations.  相似文献   

11.

Purpose

To examine whether GNAQ and GNA11 somatic mutations previously identified in uveal melanomas of Caucasians are associated with uveal melanomas in Chinese patients.

Methods

Uveal melanomas treated by primary enucleation in Chinese patients underwent a mutation analysis of GNAQ and GNA11 with sequencing of exon 5 and exon 4.

Results

The study included 50 patients with uveal melanoma and with a mean age of 47.6±13.0 years. During the follow-up of at least 3 years, 20 (40%) patients developed extraocular metastases. The frequencies of GNAQ and GNA11 somatic mutations in uveal melanoma were 18% (9/50) and 20% (10/50), respectively. The mutations occurred exclusively in codon 209 of exon 5. No mutations were detected in exon 4. Mutations affecting codon 209 in GNAQ were c.626A>C(Q209P) (78%) and c.626A>T(Q209L) (22%). Mutations affecting codon 209 in GNA11 were exclusively c.626A>T(Q209L) (100%). In none of the tumors, mutations of BRAF and NRAS were detected. GNAQ/11 mutations were marginally (P = 0.045) associated with optic disc involvement. In Kaplan-Meier analysis, metastasis-free survival was not significantly (P = 0.94) associated with GNAQ/11 mutations.

Conclusions

Mutations of GNAQ and GNA11 can be found in Chinese patients as in Caucasian patients with uveal melanoma, with a higher frequency reported for Caucasian patients.  相似文献   

12.
Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the ldquooptimal coding problem,rdquo has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.  相似文献   

13.
Acute myeloid leukemia (AML) is characterized by multiple mutagenic events that affect proliferation, survival, as well as differentiation. Recently, gain-of-function mutations in the α helical structure within the linker sequence of the E3 ubiquitin ligase CBL have been associated with AML. We identified four novel CBL mutations, including a point mutation (Y371H) and a putative splice site mutation in AML specimens. Characterization of these two CBL mutants revealed that coexpression with the receptor tyrosine kinases FLT3 (Fms-like tyrosine kinase 3) or KIT-induced ligand independent growth or ligand hyperresponsiveness, respectively. Growth of cells expressing mutant CBL required expression and kinase activity of FLT3. In addition to the CBL-dependent phosphorylation of FLT3 and CBL itself, transformation was associated with activation of Akt and STAT5 and required functional expression of the small GTPases Rho, Rac, and Cdc42. Furthermore, the mutations led to constitutively elevated intracellular reactive oxygen species levels, which is commonly linked to increased glucose metabolism in cancer cells. Inhibition of hexokinase with 2-deoxyglucose blocked the transforming activity of CBL mutants and reduced activation of signaling mechanisms. Overall, our data demonstrate that mutations of CBL alter cellular biology at multiple levels and require not only the activation of receptor proximal signaling events but also an increase in cellular glucose metabolism. Pathways that are activated by CBL gain-of-function mutations can be efficiently targeted by small molecule drugs.  相似文献   

14.
One of the major breakthroughs in oncogenesis research in recent years is the discovery that, in most patients, oncogenic mutations are concentrated in a few core biological functional pathways. This discovery indicates that oncogenic mechanisms are highly related to the dynamics of biologic regulatory networks, which govern the behaviour of functional pathways. Here, we propose that oncogenic mutations found in different biological functional pathways are closely related to parameter sensitivity of the corresponding networks. To test this hypothesis, we focus on the DNA damage-induced apoptotic pathway—the most important safeguard against oncogenesis. We first built the regulatory network that governs the apoptosis pathway, and then translated the network into dynamics equations. Using sensitivity analysis of the network parameters and comparing the results with cancer gene mutation spectra, we found that parameters that significantly affect the bifurcation point correspond to high-frequency oncogenic mutations. This result shows that the position of the bifurcation point is a better measure of the functionality of a biological network than gene expression levels of certain key proteins. It further demonstrates the suitability of applying systems-level analysis to biological networks as opposed to studying genes or proteins in isolation.  相似文献   

15.
O-Linked β-N-acetylglucosamine (O-GlcNAc) is a carbohydrate post-translational modification on hydroxyl groups of serine and/or threonine residues of cytosolic and nuclear proteins. Analogous to phosphorylation, O-GlcNAcylation plays crucial regulatory roles in cellular signaling. Recent work indicates that increased O-GlcNAcylation is a general feature of cancer and contributes to transformed phenotypes. In this minireview, we discuss how hyper-O-GlcNAcylation may be linked to various hallmarks of cancer, including cancer cell proliferation, survival, invasion, and metastasis; energy metabolism; and epigenetics. We also discuss potential therapeutic modulation of O-GlcNAc levels in cancer treatment.  相似文献   

16.
17.
Most tumors arise from epithelial tissues, such as mammary glands and lobules, and their initiation is associated with the disruption of a finely defined epithelial architecture. Progression from intraductal to invasive tumors is related to genetic mutations that occur at a subcellular level but manifest themselves as functional and morphological changes at the cellular and tissue scales, respectively. Elevated proliferation and loss of epithelial polarization are the two most noticeable changes in cell phenotypes during this process. As a result, many three-dimensional cultures of tumorigenic clones show highly aberrant morphologies when compared to regular epithelial monolayers enclosing the hollow lumen (acini). In order to shed light on phenotypic changes associated with tumor cells, we applied the bio-mechanical IBCell model of normal epithelial morphogenesis quantitatively matched to data acquired from the non-tumorigenic human mammary cell line, MCF10A. We then used a high-throughput simulation study to reveal how modifications in model parameters influence changes in the simulated architecture. Three parameters have been considered in our study, which define cell sensitivity to proliferative, apoptotic and cell-ECM adhesive cues. By mapping experimental morphologies of four MCF10A-derived cell lines carrying different oncogenic mutations onto the model parameter space, we identified changes in cellular processes potentially underlying structural modifications of these mutants. As a case study, we focused on MCF10A cells expressing an oncogenic mutant HER2-YVMA to quantitatively assess changes in cell doubling time, cell apoptotic rate, and cell sensitivity to ECM accumulation when compared to the parental non-tumorigenic cell line. By mapping in vitro mutant morphologies onto in silico ones we have generated a means of linking the morphological and molecular scales via computational modeling. Thus, IBCell in combination with 3D acini cultures can form a computational/experimental platform for suggesting the relationship between the histopathology of neoplastic lesions and their underlying molecular defects.  相似文献   

18.
19.
The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号