共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
PurposeThis study aims to investigate the use of machine learning models for delivery error prediction in proton pencil beam scanning (PBS) delivery.MethodsA dataset of planned and delivered PBS spot parameters was generated from a set of 20 prostate patient treatments. Planned spot parameters (spot position, MU and energy) were extracted from the treatment planning system (TPS) for each beam. Delivered spot parameters were extracted from irradiation log-files for each beam delivery following treatment. The dataset was used as a training dataset for three machine learning models which were trained to predict delivered spot parameters based on planned parameters. K-fold cross validation was employed for hyper-parameter tuning and model selection where the mean absolute error (MAE) was used as the model evaluation metric. The model with lowest MAE was then selected to generate a predicted dose distribution for a test prostate patient within a commercial TPS.ResultsAnalysis of the spot position delivery error between planned and delivered values resulted in standard deviations of 0.39 mm and 0.44 mm for x and y spot positions respectively. Prediction error standard deviation values of spot positions using the selected model were 0.22 mm and 0.11 mm for x and y spot positions respectively. Finally, a three-way comparison of dose distributions and DVH values for select OARs indicates that the random-forest-predicted dose distribution within the test prostate patient was in closer agreement to the delivered dose distribution than the planned distribution.ConclusionsPBS delivery error can be accurately predicted using machine learning techniques. 相似文献
3.
Erika Alessandra Pellison Nunes da Costa Cassiano Victria Carlos Magno Castelo Branco Fortaleza 《PLoS neglected tropical diseases》2021,15(8)
American trypanosomiasis (Chagas disease, CD) affects circa 7 million persons worldwide. While of those persons present the asymptomatic, indeterminate chronic form (ICF), many will eventually progress to cardiac or digestive disorders. We studied a nonconcurrent (retrospective) cohort of patients attending an outpatient CD clinic in Southeastern Brazil, who were admitted while presenting the ICF in the period from 1998 through 2018 and followed until 2019. The outcomes of interest were the progression to cardiac or digestive CD forms. We were also interested in analyzing the impact of Benznidazole therapy on the progression of the disease. Extensive review of medical charts and laboratory files was conducted, collecting data up to year 2019. Demographics (upon inclusion), body mass index, comorbidities (including the Charlson index) and use of Benznidazole were recorded. The outcomes were defined by abnormalities in those test that could not be attributed to other causes. Statistical analysis included univariate and multivariable Cox regression models. Among 379 subjects included in the study, 87 (22.9%) and 100 (26.4%) progressed to cardiac and digestive forms, respectively. In the final multivariable model, cardiac disorders were positively associated with previous coronary syndrome (Hazzard Ratio [HR], 2.42; 95% Confidence Interval [CI], 1.53–3.81) and negatively associated with Benznidazole therapy (HR, 0.26; 95%CI, 0.11–0.60). On the other hand, female gender was the only independent predictor of progression to digestive forms (HR, 1.56; 95%CI, 1.03–2.38). Our results point to the impact of comorbidities on progression do cardiac CD, with possible benefit of the use of Benznidazole. 相似文献
4.
A. A. C. Alves R. Espigolan T. Bresolin R. M. Costa G. A. Fernandes Júnior R. V. Ventura R. Carvalheiro L. G. Albuquerque 《Animal genetics》2021,52(1):32-46
This study aimed to assess the predictive ability of different machine learning (ML) methods for genomic prediction of reproductive traits in Nellore cattle. The studied traits were age at first calving (AFC), scrotal circumference (SC), early pregnancy (EP) and stayability (STAY). The numbers of genotyped animals and SNP markers available were 2342 and 321 419 (AFC), 4671 and 309 486 (SC), 2681 and 319 619 (STAY) and 3356 and 319 108 (EP). Predictive ability of support vector regression (SVR), Bayesian regularized artificial neural network (BRANN) and random forest (RF) were compared with results obtained using parametric models (genomic best linear unbiased predictor, GBLUP, and Bayesian least absolute shrinkage and selection operator, BLASSO). A 5‐fold cross‐validation strategy was performed and the average prediction accuracy (ACC) and mean squared errors (MSE) were computed. The ACC was defined as the linear correlation between predicted and observed breeding values for categorical traits (EP and STAY) and as the correlation between predicted and observed adjusted phenotypes divided by the square root of the estimated heritability for continuous traits (AFC and SC). The average ACC varied from low to moderate depending on the trait and model under consideration, ranging between 0.56 and 0.63 (AFC), 0.27 and 0.36 (SC), 0.57 and 0.67 (EP), and 0.52 and 0.62 (STAY). SVR provided slightly better accuracies than the parametric models for all traits, increasing the prediction accuracy for AFC to around 6.3 and 4.8% compared with GBLUP and BLASSO respectively. Likewise, there was an increase of 8.3% for SC, 4.5% for EP and 4.8% for STAY, comparing SVR with both GBLUP and BLASSO. In contrast, the RF and BRANN did not present competitive predictive ability compared with the parametric models. The results indicate that SVR is a suitable method for genome‐enabled prediction of reproductive traits in Nellore cattle. Further, the optimal kernel bandwidth parameter in the SVR model was trait‐dependent, thus, a fine‐tuning for this hyper‐parameter in the training phase is crucial. 相似文献
5.
Matthew Mort Timothy Sterne-Weiler Biao Li Edward V Ball David N Cooper Predrag Radivojac Jeremy R Sanford Sean D Mooney 《Genome biology》2014,15(1):R19
We have developed a novel machine-learning approach, MutPred Splice, for the identification of coding region substitutions that disrupt pre-mRNA splicing. Applying MutPred Splice to human disease-causing exonic mutations suggests that 16% of mutations causing inherited disease and 10 to 14% of somatic mutations in cancer may disrupt pre-mRNA splicing. For inherited disease, the main mechanism responsible for the splicing defect is splice site loss, whereas for cancer the predominant mechanism of splicing disruption is predicted to be exon skipping via loss of exonic splicing enhancers or gain of exonic splicing silencer elements. MutPred Splice is available at http://mutdb.org/mutpredsplice. 相似文献
6.
7.
《Biochimica et Biophysica Acta - Proteins and Proteomics》2020,1868(6):140406
Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs. 相似文献
8.
Fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models 总被引:2,自引:0,他引:2
There are approximately 109 proteins in a cell. A hotspot in bioinformatics is how to identify a protein's subcellular localization, if its sequence is known. In this paper, a method using fast Fourier transform-based support vector machine is developed to predict the subcellular localization of proteins from their physicochemical properties and structural parameters. The prediction accuracies reached 83% in prokaryotic organisms and 84% in eukaryotic organisms with the substitution model of the c-p-v matrix (c, composition; p, polarity; and v, molecular volume). The overall prediction accuracy was also evaluated using the "leave-one-out" jackknife procedure. The influence of the substitution model on prediction accuracy has also been discussed in the work. The source code of the new program is available on request from the authors. 相似文献
9.
《Genomics》2022,114(5):110454
Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6–13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community. 相似文献
10.
Host load prediction using linear models 总被引:11,自引:0,他引:11
This paper evaluates linear models for predicting the Digital Unix fivesecond host load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of long, fine grain load traces from a variety of real machines leads to consideration of the Box–Jenkins models (AR, MA, ARMA, ARIMA), and the ARFIMA models (due to selfsimilarity.) We also consider a simple windowedmean model. The computational requirements of these models span a wide range, making some more practical than others for incorporation into an online prediction system. We rigorously evaluate the predictive power of the models by running a large number of randomized testcases on the load traces and then datamining their results. The main conclusions are that load is consistently predictable to a very useful degree, and that the simple, practical models such as AR are sufficient for host load prediction. We recommend AR(16) models or better for host load prediction. We implement an online host load prediction system around the AR(16) model and evaluate its overhead, finding that it uses miniscule amounts of CPU time and network bandwidth. 相似文献
11.
Qi Zhao Zheng Zhao Xiaoya Fan Zhengwei Yuan Qian Mao Yudong Yao 《PLoS computational biology》2021,17(8)
Secondary structure plays an important role in determining the function of noncoding RNAs. Hence, identifying RNA secondary structures is of great value to research. Computational prediction is a mainstream approach for predicting RNA secondary structure. Unfortunately, even though new methods have been proposed over the past 40 years, the performance of computational prediction methods has stagnated in the last decade. Recently, with the increasing availability of RNA structure data, new methods based on machine learning (ML) technologies, especially deep learning, have alleviated the issue. In this review, we provide a comprehensive overview of RNA secondary structure prediction methods based on ML technologies and a tabularized summary of the most important methods in this field. The current pending challenges in the field of RNA secondary structure prediction and future trends are also discussed. 相似文献
12.
Nicolás Pedrini Sergio J. Mijailovsky Juan R. Girotti Raúl Stariolo Rubén M. Cardozo Alberto Gentile M. Patricia Juárez 《PLoS neglected tropical diseases》2009,3(5)
Background
Triatoma infestans-mediated transmission of Tripanosoma cruzi, the causative agent of Chagas disease, remains as a major health issue in southern South America. Key factors of T. infestans prevalence in specific areas of the geographic Gran Chaco region—which extends through northern Argentina, Bolivia, and Paraguay—are both recurrent reinfestations after insecticide spraying and emerging pyrethroid-resistance over the past ten years. Among alternative control tools, the pathogenicity of entomopathogenic fungi against triatomines is already known; furthermore, these fungi have the ability to fully degrade hydrocarbons from T. infestans cuticle and to utilize them as fuel and for incorporation into cellular components.Methodology and Findings
Here we provide evidence of resistance-related cuticle differences; capillary gas chromatography coupled to mass spectrometry analyses revealed that pyrethroid-resistant bugs have significantly larger amounts of surface hydrocarbons, peaking 56.2±6.4% higher than susceptible specimens. Also, a thicker cuticle was detected by scanning electron microscopy (32.1±5.9 µm and 17.8±5.4 µm for pyrethroid-resistant and pyrethroid-susceptible, respectively). In laboratory bioassays, we showed that the virulence of the entomopathogenic fungi Beauveria bassiana against T. infestans was significantly enhanced after fungal adaptation to grow on a medium containing insect-like hydrocarbons as the carbon source, regardless of bug susceptibility to pyrethroids. We designed an attraction-infection trap based on manipulating T. infestans behavior in order to facilitate close contact with B. bassiana. Field assays performed in rural village houses infested with pyrethroid-resistant insects showed 52.4% bug mortality. Using available mathematical models, we predicted that further fungal applications could eventually halt infection transmission.Conclusions
This low cost, low tech, ecologically friendly methodology could help in controlling the spread of pyrethroid-resistant bugs. 相似文献13.
Disulfide bridges stabilize protein structures covalently and play an important role in protein folding. Predicting disulfide connectivity precisely helps towards the solution of protein structure prediction. Previous methods for disulfide connectivity prediction either infer the bonding potential of cysteine pairs or rank alternative disulfide bonding patterns. As a result, these methods encode data according to cysteine pairs (pair-wise) or disulfide bonding patterns (pattern-wise). However, using either encoding scheme alone cannot fully utilize the local and global information of proteins, so the accuracies of previous methods are limited. In this work, we propose a novel two-level framework to predict disulfide connectivity. With this framework, both the pair-wise and pattern-wise encoding schemes are considered. Our models were validated on the datasets derived from SWISS-PROT 39 and 43, and the results demonstrate that our models can combine both local and global information. Compared to previous methods, significant improvements were obtained by our models. Our work may also provide insights to further improvements of disulfide connectivity prediction and increase its applicability in protein structure analysis and prediction. 相似文献
14.
Genetic relationships and population structure of 8 horse breeds in the Czech and Slovak Republics were investigated using
classification methods for breed discrimination. To demonstrate genetic differences among these breeds, we used genetic information
— genotype data of microsatellite markers and classification algorithms — to perform a probabilistic prediction of an individual’s
breed. In total, 932 unrelated animals were genotyped for 17 microsatellite markers recommended by the ISAG for parentage
testing (AHT4, AHT5, ASB2, HMS3, HMS6, HMS7, HTG4, HTG10, VHL20, HTG6, HMS2, HTG7, ASB17, ASB23, CA425, HMS1, LEX3). Algorithms
of classification methods — J48 (decision trees); Naive Bayes, Bayes Net (probability predictors); IB1, IB5 (instance-based
machine learning methods); and JRip (decision rules) — were used for analysis of their classification performance and of results
of classification on this genotype dataset. Selected classification methods (Naive Bayes, Bayes Net, IB1), based on machine
learning and principles of artificial intelligence, appear usable for these tasks. 相似文献
15.
16.
Letizia Lamperti Théophile Sanchez Sara Si Moussi David Mouillot Camille Albouy Benjamin Flück Morgane Bruno Alice Valentini Loïc Pellissier Stéphanie Manel 《Molecular ecology resources》2023,23(8):1946-1958
Environmental DNA (eDNA) metabarcoding provides an efficient approach for documenting biodiversity patterns in marine and terrestrial ecosystems. The complexity of these data prevents current methods from extracting and analyzing all the relevant ecological information they contain, and new methods may provide better dimensionality reduction and clustering. Here we present two new deep learning-based methods that combine different types of neural networks (NNs) to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders and the second on deep metric learning. The strength of our new methods lies in the combination of two inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU) detected and their corresponding nucleotide sequence. Using three different datasets, we show that our methods accurately represent several biodiversity indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, Jaccard's and sequence β-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, Nonmetric Multidimensional Scaling and Uniform Manifold Approximation and Projection for dimension reduction. Our results suggest that NNs provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring. 相似文献
17.
Background
Long noncoding RNAs (lncRNAs) are widely involved in the initiation and development of cancer. Although some computational methods have been proposed to identify cancer-related lncRNAs, there is still a demanding to improve the prediction accuracy and efficiency. In addition, the quick-update data of cancer, as well as the discovery of new mechanism, also underlay the possibility of improvement of cancer-related lncRNA prediction algorithm. In this study, we introduced CRlncRC, a novel Cancer-Related lncRNA Classifier by integrating manifold features with five machine-learning techniques.Results
CRlncRC was built on the integration of genomic, expression, epigenetic and network, totally in four categories of features. Five learning techniques were exploited to develop the effective classification model including Random Forest (RF), Naïve bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR) and K-Nearest Neighbors (KNN). Using ten-fold cross-validation, we showed that RF is the best model for classifying cancer-related lncRNAs (AUC?=?0.82). The feature importance analysis indicated that epigenetic and network features play key roles in the classification. In addition, compared with other existing classifiers, CRlncRC exhibited a better performance both in sensitivity and specificity. We further applied CRlncRC to lncRNAs from the TANRIC (The Atlas of non-coding RNA in Cancer) dataset, and identified 121 cancer-related lncRNA candidates. These potential cancer-related lncRNAs showed a certain kind of cancer-related indications, and many of them could find convincing literature supports.Conclusions
Our results indicate that CRlncRC is a powerful method for identifying cancer-related lncRNAs. Machine-learning-based integration of multiple features, especially epigenetic and network features, had a great contribution to the cancer-related lncRNA prediction. RF outperforms other learning techniques on measurement of model sensitivity and specificity. In addition, using CRlncRC method, we predicted a set of cancer-related lncRNAs, all of which displayed a strong relevance to cancer as a valuable conception for the further cancer-related lncRNA function studies.18.
19.