首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 357 毫秒
1.
Alburnus alburnus alborella is a fish species native to northern Italy. It has suffered a very sharp decrease in population over the last 20 years due to human impact. Therefore, it was selected for reintroduction projects. In this research project, support vector machines (SVM) were tested as possible tools for building reliable models of presence/absence of the species. A system of 198 sites located along the rivers of Piedmont in North-Western Italy was investigated. At each site, 19 physical-chemical and environmental variables were measured. We verified that performances did not improve after feature selection but, instead, they slightly decreased (from Correctly Classified Instances [CCI] = 84.34 and Cohen's k [k] = 0.69 to CCI = 82.81 and k = 0.66). However, feature selection is crucial in identifying the relevant features for the presence/absence of the species. We then compared SVMs performances with decision trees (DTs) and artificial neural networks (ANNs) built using the same dataset. SVMs outperformed DTs (CCI = 81.39 and k = 0.63) but not ANNs (CCI = 83.03 and k = 0.66), showing that SVMs and ANNs are the best performing models, proving that their application in freshwater management is more promising than traditional and other machine-learning techniques.  相似文献   

2.
Although developed for completely different applications, the great technological potential of data analysis methods called “data mining” has increasingly been realized as a method for efficiently analyzing potentials for optimization and for troubleshooting within many application areas of process, technology. This paper presents the successful application of data mining methods for the optimization of a fermentation process, and discusses diverse characteristics of data mining for biological processes. For the optimization of biological processes a huge amount of possibly relevant process parameters exist. Those input variables can be parameters from devices as well as process control parameters. The main challenge of such optimizations is to robustly identify relevant combinations of parameters among a huge amount of process parameters. For the underlying process we found with the application of data mining methods, that the moment a special carbohydrate component is added has a strong impact on the formation of secondary components. The yield could also be increased by using 2 m3 fermentors instead of 1 m3 fermentors.  相似文献   

3.
One of the challenges in oceanography is to understand the influence of environmental factors on the abundances of prokaryotes and viruses. Generally, conventional statistical methods resolve trends well, but more complex relationships are difficult to explore. In such cases, Artificial Neural Networks (ANNs) offer an alternative way for data analysis. Here, we developed ANN-based models of prokaryotic and viral abundances in the Arctic Ocean. The models were used to identify the best predictors for prokaryotic and viral abundances including cytometrically-distinguishable populations of prokaryotes (high and low nucleic acid cells) and viruses (high- and low-fluorescent viruses) among salinity, temperature, depth, day length, and the concentration of Chlorophyll-a. The best performing ANNs to model the abundances of high and low nucleic acid cells used temperature and Chl-a as input parameters, while the abundances of high- and low-fluorescent viruses used depth, Chl-a, and day length as input parameters. Decreasing viral abundance with increasing depth and decreasing system productivity was captured well by the ANNs. Despite identifying the same predictors for the two populations of prokaryotes and viruses, respectively, the structure of the best performing ANNs differed between high and low nucleic acid cells and between high- and low-fluorescent viruses. Also, the two prokaryotic and viral groups responded differently to changes in the predictor parameters; hence, the cytometric distinction between these populations is ecologically relevant. The models imply that temperature is the main factor explaining most of the variation in the abundances of high nucleic acid cells and total prokaryotes and that the mechanisms governing the reaction to changes in the environment are distinctly different among the prokaryotic and viral populations.  相似文献   

4.
With the advent of the -omics era, classical technology platforms, such as hyphenated mass spectrometry, are currently undergoing a transformation toward high-throughput application. These novel platforms yield highly detailed metabolite profiles in large numbers of samples. Such profiles can be used as fingerprints for the accurate identification and classification of samples as well as for the study of effects of experimental conditions on the concentrations of specific metabolites. Challenges for the application of these methods lie in the acquisition of high-quality data, data normalization, and data mining. Here, a high-throughput fingerprinting approach based on analysis of headspace volatiles using ultrafast gas chromatography coupled to time of flight mass spectrometry (ultrafast GC/TOF-MS) was developed and evaluated for classification and screening purposes in food fermentation. GC-MS mass spectra of headspace samples of milk fermented by different mixed cultures of lactic acid bacteria (LAB) were collected and preprocessed in MetAlign, a dedicated software package for the preprocessing and comparison of liquid chromatography (LC)-MS and GC-MS data. The Random Forest algorithm was used to detect mass peaks that discriminated combinations of species or strains used in fermentations. Many of these mass peaks originated from key flavor compounds, indicating that the presence or absence of individual strains or combinations of strains significantly influenced the concentrations of these components. We demonstrate that the approach can be used for purposes like the selection of strains from collections based on flavor characteristics and the screening of (mixed) cultures for the presence or absence of strains. In addition, we show that strain-specific flavor characteristics can be traced back to genetic markers when comparative genome hybridization (CGH) data are available.  相似文献   

5.
Volker Bahn  Brian J. McGill 《Oikos》2013,122(3):321-331
Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold‐out and two spatially segregated data hold‐out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R2 for absolute abundance, squared correlation coefficient r2 for relative abundance and AUC/Somer’s D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold‐out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.  相似文献   

6.
In Piedmont (Italy) the environmental changes due to human impact have had profound effects on rivers and their inhabitants. Thus, it is necessary to develop practical tools providing accurate ecological assessments of river and species conditions. We focus our attention on Salmo marmoratus, an endangered salmonid which is characteristic of the Po river system in Italy. In order to contribute to the management of the species, four different approaches were used to assess its presence: discriminant function analysis, logistic regression, decision tree models and artificial neural networks. Either all the 20 environmental variables measured in the field or the 7 coming from feature selection were used to classify sites as positive or negative for S. marmoratus. The performances of the different models were compared. Discriminant function analysis, logistic regression, and decision tree models (unpruned and pruned) had relatively high percentages of correctly classified instances. Although neither tree-pruning technique improved the reliability of the models significantly, they did reduce the tree complexity and hence increased the clarity of the models. The artificial neural network (ANN) approach, especially the model built with the 7 inputs coming from feature selection, showed better performance than all the others. The relative contribution of each independent variable to this model was determined by using the sensitivity analysis technique. Our findings proved that the ANNs were more effective than the other classification techniques. Moreover, ANNs achieved their high potentials when they were applied in models used to make decisions regarding river and conservation management.  相似文献   

7.
8.
ModEco: an integrated software package for ecological niche modeling   总被引:2,自引:0,他引:2  
Qinghua Guo  Yu Liu 《Ecography》2010,33(4):637-642
ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence‐only data, and abundance data; 2) it provides a range of models when dealing with presence‐only data, such as presence‐only models, pseudo‐absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment.  相似文献   

9.
Many recent studies of extinction risk have attempted to determine what differences exist between threatened and non-threatened species. One potential problem in such studies is that species-level data may contain phylogenetic non-independence. However, the use of phylogenetic comparative methods (PCM) to account for non-independence remains controversial, and some recent studies of extinction have recommended other methods that do not account for phylogenetic non-independence, notably decision trees (DTs). Here we perform a systematic comparison of techniques, comparing the performance of PCM regression models with corresponding non-phylogenetic regressions and DTs over different clades and response variables. We found that predictions were broadly consistent among techniques, but that predictive precision varied across techniques with PCM regression and DTs performing best. Additionally, despite their inability to account for phylogenetic non-independence, DTs were useful in highlighting interaction terms for inclusion in the PCM regression models. We discuss the implications of these findings for future comparative studies of extinction risk.  相似文献   

10.
Deep learning (DL) is one of the most powerful data-driven machine-learning techniques in artificial intelligence (AI). It can automatically learn from raw data without manual feature selection. DL models have led to remarkable advances in data extraction and analysis for medical imaging. Magnetic resonance imaging (MRI) has proven useful in delineating the characteristics and extent of breast lesions and tumors. This review summarizes the current state-of-the-art applications of DL models in breast MRI. Many recent DL models were examined in this field, along with several advanced learning approaches and methods for data normalization and breast and lesion segmentation. For clinical applications, DL-based breast MRI models were proven useful in five aspects: diagnosis of breast cancer, classification of molecular types, classification of histopathological types, prediction of neoadjuvant chemotherapy response, and prediction of lymph node metastasis. For subsequent studies, further improvement in data acquisition and preprocessing is necessary, additional DL techniques in breast MRI should be investigated, and wider clinical applications need to be explored.  相似文献   

11.
Dou Y  Mi H  Zhao L  Ren Y  Ren Y 《Analytical biochemistry》2006,351(2):174-180
A method for simultaneous, nondestructive analysis of aminopyrine and phenacetin in compound aminopyrine phenacetin tablets with different concentrations has been developed by principal component artificial neural networks (PC-ANNs) on near-infrared (NIR) spectroscopy. In PC-ANN models, the spectral data were initially analyzed by principal component analysis. Then the scores of the principal components were chosen as input nodes for the input layer instead of the spectral data. The artificial neural network models using the spectral data as input nodes were also established and compared with the PC-ANN models. Four different preprocessing methods (first-derivative, second-derivative, standard normal variate (SNV), and multiplicative scatter correction) were applied to three sets of NIR spectra of compound aminopyrine phenacetin tablets. The PC-ANNs approach with SNV preprocessing spectra was found to provide the best results. The degree of approximation was performed as the selective criterion of the optimum network parameters.  相似文献   

12.
The extensive data requirements of three-dimensional inverse dynamics and joint modelling to estimate spinal loading prevent the implementation of these models in industry and may hinder development of advanced injury prevention standards. This work examines the potential of feed forward artificial neural networks (ANNs) as a data reduction approach and compared predictions to rigid link and EMG-assisted models. Ten males and ten females performed dynamic lifts, all approaches were applied and comparisons of predicted joint moments and joint forces were evaluated. While the ANN under- predicted peak extension moments (p = 0.0261) and joint compression (p < 0.0001), predictions of cumulative extension moments (p = 0.8293) and cumulative joint compression (p = 0.9557) were not different. Therefore, the ANNs proposed may be used to obtain estimates of cumulative exposure variables with reduced input demands; however they should not be applied to determine peak demands of a worker's exposure.  相似文献   

13.
M. Nie    W. Q. Zhang    M. Xiao    J. L. Luo    K. Bao    J. K. Chen    B. Li 《Journal of Phytopathology》2007,155(6):364-367
A rapid spectroscopic approach for whole‐organism fingerprinting of Fourier transform infrared (FT‐IR) spectroscopy was used to analyse 16 isolates from five closely related species of Fusarium: F. graminearum, F. moniliforme, F. nivale, F. semitectum and F. oxysporum. Principal components analysis and hierarchical cluster analysis were used to study the clusters in the data. On visual inspection of the clusters from both methods, the spectra were not differentiated into five separate clusters corresponding to species and these unsupervised methods failed to identify these fungal strains. When the data were trained by back propagation algorithm of artificial neural networks (ANNs) with principal components scores of spectra used as input modes, the strains were accurately predicted and recognized. The results in this study show that FT‐IR spectroscopy in combination with principal component artificial neural networks (PC‐ANNs) is well suited for identifying Fusarium spp. It would be advantageous to establish a comprehensive database of taxonomically well‐defined Fusarium species to aid the identification of unknown strains.  相似文献   

14.
Species distribution models (SDMs) increasingly have been used to anticipate the spread of invasive species. However, these models are powerful conservation tools only if they are biologically relevant, and thus validation of these models is essential. Here, we evaluate four model selection frameworks for their ability to identify a best fit model of spread under low data conditions early in an invasion, specifically testing the efficacy of methods that utilize absence data in addition to presence data in evaluating models. We test this question using a simulation where we generated data with varying confidence in the accuracy of the absence data, as absences in early invasions may become presences in the future, and increasing quantity of observation data to test the models. We create these simulations based on a real-world example of a newly emergent, invasive fungal pathogen, Batrachochytrium salamandrivorans (Bsal). The simulation demonstrated that AIC and Likelihood consistently outperform both Kappa and AUC in selecting the true model as the best model when data are limited and absence data are low quality, with AIC providing the most conservative results due to penalties for overparameterization. With these results, we then used these techniques to compare five candidate models for predicting the spread of Bsal. Consistent with the simulation, the best fit model of the candidate models for Bsal was inconsistent across the four metrics. However, AIC, which performed best in the simulation study, suggested that the spread of Bsal into Western Europe was best predicted by a combination of bioclimatic suitability, salamander richness, and number of salamander imports. Our results highlight the difficulty in evaluating predictive models when data are limited and of low quality, as with a newly invasive species, but show that these challenges can be partially addressed with the appropriate model selection approach. Use of this approach is critical as SDMs of invasive species are often used to inform conservation policy and efforts before the invasion spreads, when limited data are available.  相似文献   

15.

Background

The Eocene, a time of fluctuating environmental change and biome evolution, was generally driven by exceptionally warm temperatures. The Messel (47.8 Ma) and Eckfeld (44.3 Ma) deposits offer a rare opportunity to take a census of two, deep-time ecosystems occurring during a greenhouse system. An understanding of the long-term consequences of extreme warming and cooling events during this interval, particularly on angiosperms and insects that dominate terrestrial biodiversity, can provide insights into the biotic consequences of current global climatic warming.

Methodology/Principal Findings

We compare insect-feeding damage within two middle Eocene fossil floras, Messel and Eckfeld, in Germany. From these small lake deposits, we studied 16,082 angiosperm leaves and scored each specimen for the presence or absence of 89 distinctive and diagnosable insect damage types (DTs), each of which was allocated to a major functional feeding group, including four varieties of external foliage feeding, piercing- and-sucking, leaf mining, galling, seed predation, and oviposition. Methods used for treatment of presence–absence data included general linear models and standard univariate, bivariate and multivariate statistical techniques.

Conclusions/Significance

Our results show an unexpectedly high diversity and level of insect feeding than comparable, penecontemporaneous floras from North and South America. In addition, we found a higher level of herbivory on evergreen, rather than deciduous taxa at Messel. This pattern is explained by a ca. 2.5-fold increase in atmospheric CO2 that overwhelmed evergreen antiherbivore defenses, subsequently lessened during the more ameliorated levels of Eckfeld times. These patterns reveal important, previously undocumented features of plant-host and insect-herbivore diversification during the European mid Eocene.  相似文献   

16.
Two rapid vibrational spectroscopic approaches (diffuse reflectance-absorbance Fourier transform infrared [FT-IR] and dispersive Raman spectroscopy), and one mass spectrometric method based on in vacuo Curie-point pyrolysis (PyMS), were investigated in this study. A diverse range of unprocessed, industrial fed-batch fermentation broths containing the fungus Gibberella fujikuroi producing the natural product gibberellic acid, were analyzed directly without a priori chromatographic separation. Partial least squares regression (PLSR) and artificial neural networks (ANNs) were applied to all of the information-rich spectra obtained by each of the methods to obtain quantitative information on the gibberellic acid titer. These estimates were of good precision, and the typical root-mean-square error for predictions of concentrations in an independent test set was <10% over a very wide titer range from 0 to 4925 ppm. However, although PLSR and ANNs are very powerful techniques they are often described as "black box" methods because the information they use to construct the calibration model is largely inaccessible. Therefore, a variety of novel evolutionary computation-based methods, including genetic algorithms and genetic programming, were used to produce models that allowed the determination of those input variables that contributed most to the models formed, and to observe that these models were predominantly based on the concentration of gibberellic acid itself. This is the first time that these three modern analytical spectroscopies, in combination with advanced chemometric data analysis, have been compared for their ability to analyze a real commercial bioprocess. The results demonstrate unequivocally that all methods provide very rapid and accurate estimates of the progress of industrial fermentations, and indicate that, of the three methods studied, Raman spectroscopy is the ideal bioprocess monitoring method because it can be adapted for on-line analysis.  相似文献   

17.

Background  

State-of-the-art signal processing methods are known to detect information in single-trial event-related EEG data, a crucial aspect in development of real-time applications such as brain computer interfaces. This paper investigates one such novel approach, evaluating how individual classifier and feature subset tailoring affects classification of single-trial EEG finger movements. The discrete wavelet transform was used to extract signal features that were classified using linear regression and non-linear neural network models, which were trained and architecturally optimized with evolutionary algorithms. The input feature subsets were also allowed to evolve, thus performing feature selection in a wrapper fashion. Filter approaches were implemented as well by limiting the degree of optimization.  相似文献   

18.
Artificial neural networks (ANNs) have been used for the recognition of non-linear patterns, a characteristic of bioprocesses like wine production. In this work, ANNs were tested to predict problems of wine fermentation. A database of about 20,000 data from industrial fermentations of Cabernet Sauvignon and 33 variables was used. Two different ways of inputting data into the model were studied, by points and by fermentation. Additionally, different sub-cases were studied by varying the predictor variables (total sugar, alcohol, glycerol, density, organic acids and nitrogen compounds) and the time of fermentation (72, 96 and 256 h). The input of data by fermentations gave better results than the input of data by points. In fact, it was possible to predict 100% of normal and problematic fermentations using three predictor variables: sugars, density and alcohol at 72 h (3 days). Overall, ANNs were capable of obtaining 80% of prediction using only one predictor variable at 72 h; however, it is recommended to add more fermentations to confirm this promising result.  相似文献   

19.
In this study, the applicability of three modelling approaches was determined in an effort to describe complex relationships between process parameters and to predict the performance of an integrated process, which consisted of a fluidized bed bioreactor for Fe3+ regeneration and a gravity settler for precipitative iron removal. Self-organizing maps were used to visually evaluate the associations between variables prior to the comparison of two different modelling methods, the multiple regression modelling and artificial neural network (ANN) modelling, for predicting Fe(III) precipitation. With the ANN model, an excellent match between the predicted and measured data was obtained (R 2 = 0.97). The best-fitting regression model also gave a good fit (R 2 = 0.87). This study demonstrates that ANNs and regression models are robust tools for predicting iron precipitation in the integrated process and can thus be used in the management of such systems.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号