首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Several research initiatives have been undertaken to map fishing effort at high spatial resolution using the Vessel Monitoring System (VMS). An alternative to the VMS is represented by the Automatic Identification System (AIS), which in the EU became compulsory in May 2014 for all fishing vessels of length above 15 meters. The aim of this paper is to assess the uptake of the AIS in the EU fishing fleet and the feasibility of producing a map of fishing effort with high spatial and temporal resolution at European scale. After analysing a large AIS dataset for the period January-August 2014 and covering most of the EU waters, we show that AIS was adopted by around 75% of EU fishing vessels above 15 meters of length. Using the Swedish fleet as a case study, we developed a method to identify fishing activity based on the analysis of individual vessels’ speed profiles and produce a high resolution map of fishing effort based on AIS data. The method was validated using detailed logbook data and proved to be sufficiently accurate and computationally efficient to identify fishing grounds and effort in the case of trawlers, which represent the largest portion of the EU fishing fleet above 15 meters of length. Issues still to be addressed before extending the exercise to the entire EU fleet are the assessment of coverage levels of the AIS data for all EU waters and the identification of fishing activity in the case of vessels other than trawlers.  相似文献   

2.

Context

Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem.

Objective

This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment.

Method

The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships.

Results

The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively.

Conclusion

The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns.  相似文献   

3.
《IRBM》2022,43(1):49-61
Background and objectiveBreast cancer, the most intrusive form of cancer affecting women globally. Next to lung cancer, breast cancer is the one that provides a greater number of cancer deaths among women. In recent times, several intelligent methodologies were come into existence for building an effective detection and classification of such noxious type of cancer. For further improving the rate of early diagnosis and for increasing the life span of victims, optimistic light of research is essential in breast cancer classification. Accordingly, a new customized method of integrating the concept of deep learning with the extreme learning machine (ELM), which is optimized using a simple crow-search algorithm (ICS-ELM). Thus, to enhance the state-of-the-art workings, an improved deep feature-based crow-search optimized extreme learning machine is proposed for addressing the health-care problem. The paper pours a light-of-research on detecting the input mammograms as either normal or abnormal. Subsequently, it focuses on further classifying the type of abnormal severities i.e., benign type or malignant.Materials and methodsThe digital mammograms for this work are taken from the Curated Breast Imaging Subset of DDSM (CBIS-DDSM), Mammographic Image Analysis Society (MIAS), and INbreast datasets. Herein, the work employs 570 digital mammograms (250 normal, 200 benign and 120 malignant cases) from CBIS-DDSM dataset, 322 digital mammograms (207 normal, 64 benign and 51 malignant cases) from MIAS database and 179 full-field digital mammograms (66 normal, 56 benign and 57 malignant cases) from INbreast dataset for its evaluation. The work utilizes ResNet-18 based deep extracted features with proposed Improved Crow-Search Optimized Extreme Learning Machine (ICS-ELM) algorithm.ResultsThe proposed work is finally compared with the existing Support Vector Machines (RBF kernel), ELM, particle swarm optimization (PSO) optimized ELM, and crow-search optimized ELM, where the maximum overall classification accuracy is obtained for the proposed method with 97.193% for DDSM, 98.137% for MIAS and 98.266% for INbreast datasets, respectively.ConclusionThe obtained results reveal that the proposed Computer-Aided-Diagnosis (CAD) tool is robust for the automatic detection and classification of breast cancer.  相似文献   

4.

Background

Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

Methods

The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

Results

After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

Conclusion

The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.  相似文献   

5.
There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier.  相似文献   

6.
Small-angle x-ray scattering (SAXS) of biological macromolecules in solutions is a widely employed method in structural biology. SAXS patterns include information about the overall shape and low-resolution structure of dissolved particles. Here, we describe how to transform experimental SAXS patterns to feature vectors and how a simple k-nearest neighbor approach is able to retrieve information on overall particle shape and maximal diameter (Dmax) as well as molecular mass directly from experimental scattering data. Based on this transformation, we develop a rapid multiclass shape-classification ranging from compact, extended, and flat categories to hollow and random-chain-like objects. This classification may be employed, e.g., as a decision block in automated data analysis pipelines. Further, we map protein structures from the Protein Data Bank into the classification space and, in a second step, use this mapping as a data source to obtain accurate estimates for the structural parameters (Dmax, molecular mass) of the macromolecule under study based on the experimental scattering pattern alone, without inverse Fourier transform for Dmax. All methods presented are implemented in a Fortran binary DATCLASS, part of the ATSAS data analysis suite, available on Linux, Mac, and Windows and free for academic use.  相似文献   

7.
Optimization of learning machine prediction has been attempted by the addition of a “width” parameter proposed by Wangen et al. Predictions were made for the production of phenazine compounds, the utilizations of d-sorbitol, of n-hexadecane, and of histamine by pseudomonads. In all but one cases examined, the “stability” of prediction was much improved when a high positive value was assigned to the parameter. A slight increase was often found in the hit-to-miss ratio, but the χ2 test revealed that it was statistically insignificant.  相似文献   

8.
Boolean implications (if-then rules) provide a conceptually simple, uniform and highly scalable way to find associations between pairs of random variables. In this paper, we propose to use Boolean implications to find relationships between variables of different data types (mutation, copy number alteration, DNA methylation and gene expression) from the glioblastoma (GBM) and ovarian serous cystadenoma (OV) data sets from The Cancer Genome Atlas (TCGA). We find hundreds of thousands of Boolean implications from these data sets. A direct comparison of the relationships found by Boolean implications and those found by commonly used methods for mining associations show that existing methods would miss relationships found by Boolean implications. Furthermore, many relationships exposed by Boolean implications reflect important aspects of cancer biology. Examples of our findings include cis relationships between copy number alteration, DNA methylation and expression of genes, a new hierarchy of mutations and recurrent copy number alterations, loss-of-heterozygosity of well-known tumor suppressors, and the hypermethylation phenotype associated with IDH1 mutations in GBM. The Boolean implication results used in the paper can be accessed at http://crookneck.stanford.edu/microarray/TCGANetworks/.  相似文献   

9.
《IRBM》2022,43(1):2-12
ObjectivesThis study focuses on integration of anatomical left ventricle myocardium features and optimized extreme learning machine (ELM) for discrimination of subjects with normal, mild, moderate and severe abnormal ejection fraction (EF). The physiological alterations in myocardium have diagnostic relevance to the etiology of cardiovascular diseases (CVD) with reduced EF.Materials and MethodsThis assessment is carried out on cardiovascular magnetic resonance (CMR) images of 104 subjects available in Kaggle Second Annual Data Science Bowl. The Segment CMR framework is used to segment myocardium from cardiac MR images, and it is subdivided into 16 sectors. 86 clinically significant anatomical features are extracted and subjected to ELM framework. Regularization coefficient and hidden neurons influence the prediction accuracy of ELM. The optimal value for these parameters is achieved with the butterfly optimizer (BO). A comparative study of BOELM framework with different activation functions and feature set has been conducted.ResultsAmong the individual feature set, myocardial volume at ED gives a better classification accuracy of 83.3% compared to others. Further, the given BOELM framework is able to provide higher multi-class accuracy of 95.2% with the entire feature set than ELM. Better discrimination of healthy and moderate abnormal subjects is achieved than other sub groups.ConclusionThe combined anatomical sector wise myocardial features assisted BOELM is able to predict the severity levels of CVDs. Thus, this study supports the radiologists in the mass diagnosis of cardiac disorder.  相似文献   

10.
The cup nerve head, optic cup, optic disc ratio and neural rim configuration are observed as important for detecting glaucoma at an early stage in clinical practice. The main clinical indicator of glaucoma optic cup to disc ratio is currently determined manually by limiting the mass screening was potential. This paper proposes the following methods for an automatic cup to disc ratio determination. In the first part of the work, fundus image of the optic disc region is considered. Clustering means K is used automatically to extract the optic disc whereas K-value is automatically selected by algorithm called hill climbing. The segmented contour of optic cup has been smoothened by two methods namely elliptical fitting and morphological fitting. Cup to disc ratio is calculated for 50 normal images and 50 fundus images of glaucoma patients. Throughout this paper, the same set of images has been used and for these images, the cup to disc ratio values are provided by ophthalmologist which is taken as the gold standard value. The error is calculated with reference to this gold standard value throughout the paper for cup to disc ratio comparison. The mean error of the K-means clustering method for elliptical and morphological fitting is 4.5% and 4.1%, respectively. Since the error is high, fuzzy C-mean clustering has been chosen and the mean error of the method for elliptical and morphological fitting is 3.83% and 3.52%. The error can further be minimized by considering the inter pixel relation. To achieve another algorithm is by Spatially Weighted fuzzy C-means Clustering (SWFCM) is used. The optic disc and optic cup have clustered and segmented by SWFCM Clustering. The SWFCM mean error clustering method for elliptical and morphological fitting is 3.06% and 1.67%, respectively. In this work fundus images were collected from Aravind eye Hospital, Pondicherry.  相似文献   

11.
Since its identification in 1983, HIV-1 has been the focus of a research effort unprecedented in scope and difficulty, whose ultimate goals — a cure and a vaccine – remain elusive. One of the fundamental challenges in accomplishing these goals is the tremendous genetic variability of the virus, with some genes differing at as many as 40% of nucleotide positions among circulating strains. Because of this, the genetic bases of many viral phenotypes, most notably the susceptibility to neutralization by a particular antibody, are difficult to identify computationally. Drawing upon open-source general-purpose machine learning algorithms and libraries, we have developed a software package IDEPI (IDentify EPItopes) for learning genotype-to-phenotype predictive models from sequences with known phenotypes. IDEPI can apply learned models to classify sequences of unknown phenotypes, and also identify specific sequence features which contribute to a particular phenotype. We demonstrate that IDEPI achieves performance similar to or better than that of previously published approaches on four well-studied problems: finding the epitopes of broadly neutralizing antibodies (bNab), determining coreceptor tropism of the virus, identifying compartment-specific genetic signatures of the virus, and deducing drug-resistance associated mutations. The cross-platform Python source code (released under the GPL 3.0 license), documentation, issue tracking, and a pre-configured virtual machine for IDEPI can be found at https://github.com/veg/idepi.
This is a PLOS Computational Biology Software Article
  相似文献   

12.
13.
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.  相似文献   

14.
15.
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.  相似文献   

16.
Under the network environment, the trading volume and asset price of a financial commodity or instrument are affected by various complicated factors. Machine learning and sentiment analysis provide powerful tools to collect a great deal of data from the website and retrieve useful information for effectively forecasting financial risk of associated companies. This article studies trading volume and asset price risk when sentimental financial information data are available using both sentiment analysis and popular machine learning approaches: artificial neural network (ANN) and support vector machine (SVM). Nonlinear GARCH-based mining models are developed by integrating GARCH (generalized autoregressive conditional heteroskedasticity) theory and ANN and SVM. Empirical studies in the U.S. stock market show that the proposed approach achieves favorable forecast performances. GARCH-based SVM outperforms GARCH-based ANN for volatility forecast, whereas GARCH-based ANN achieves a better forecast result for the volatility trend. Results also indicate a strong correlation between information sentiment and both trading volume and asset price volatility.  相似文献   

17.
The goal of this work is to introduce new metrics to assess risk of Alzheimer''s disease (AD) which we call AD Pattern Similarity (AD-PS) scores. These metrics are the conditional probabilities modeled by large-scale regularized logistic regression. The AD-PS scores derived from structural MRI and cognitive test data were tested across different situations using data from the Alzheimer''s Disease Neuroimaging Initiative (ADNI) study. The scores were computed across groups of participants stratified by cognitive status, age and functional status. Cox proportional hazards regression was used to evaluate associations with the distribution of conversion times from mild cognitive impairment to AD. The performances of classifiers developed using data from different types of brain tissue were systematically characterized across cognitive status groups. We also explored the performance of anatomical and cognitive-anatomical composite scores generated by combining the outputs of classifiers developed using different types of data. In addition, we provide the AD-PS scores performance relative to other metrics used in the field including the Spatial Pattern of Abnormalities for Recognition of Early AD (SPARE-AD) index and total hippocampal volume for the variables examined.  相似文献   

18.
《Biophysical journal》2020,118(5):1165-1176
All medications have adverse effects. Among the most serious of these are cardiac arrhythmias. Current paradigms for drug safety evaluation are costly, lengthy, conservative, and impede efficient drug development. Here, we combine multiscale experiment and simulation, high-performance computing, and machine learning to create a risk estimator to stratify new and existing drugs according to their proarrhythmic potential. We capitalize on recent developments in machine learning and integrate information across 10 orders of magnitude in space and time to provide a holistic picture of the effects of drugs, either individually or in combination with other drugs. We show, both experimentally and computationally, that drug-induced arrhythmias are dominated by the interplay between two currents with opposing effects: the rapid delayed rectifier potassium current and the L-type calcium current. Using Gaussian process classification, we create a classifier that stratifies drugs into safe and arrhythmic domains for any combinations of these two currents. We demonstrate that our classifier correctly identifies the risk categories of 22 common drugs exclusively on the basis of their concentrations at 50% current block. Our new risk assessment tool explains under which conditions blocking the L-type calcium current can delay or even entirely suppress arrhythmogenic events. Using machine learning in drug safety evaluation can provide a more accurate and comprehensive mechanistic assessment of the proarrhythmic potential of new drugs. Our study paves the way toward establishing science-based criteria to accelerate drug development, design safer drugs, and reduce heart rhythm disorders.  相似文献   

19.
《IRBM》2022,43(6):678-686
ObjectivesFeature selection in data sets is an important task allowing to alleviate various machine learning and data mining issues. The main objectives of a feature selection method consist on building simpler and more understandable classifier models in order to improve the data mining and processing performances. Therefore, a comparative evaluation of the Chi-square method, recursive feature elimination method, and tree-based method (using Random Forest) used on the three common machine learning methods (K-Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed to select the most relevant primitives from a large set of attributes. Furthermore, determining the most suitable couple (i.e., feature selection method-machine learning method) that provides the best performance is performed.Materials and methodsIn this paper, an overview of the most common feature selection techniques is first provided: the Chi-Square method, the Recursive Feature Elimination method (RFE) and the tree-based method (using Random Forest). A comparative evaluation of the improvement (brought by such feature selection methods) to the three common machine learning methods (K- Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed. For evaluation purposes, the following measures: micro-F1, accuracy and root mean square error are used on the stroke disease data set.ResultsThe obtained results show that the proposed approach (i.e., Tree Based Method using Random Forest, TBM-RF, decision tree classifier, DTC) provides accuracy higher than 85%, F1-score higher than 88%, thus, better than the KNN and NB using the Chi-Square, RFE and TBM-RF methods.ConclusionThis study shows that the couple - Tree Based Method using Random Forest (TBM-RF) decision tree classifier successfully and efficiently contributes to find the most relevant features and to predict and classify patient suffering of stroke disease.”  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号