首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy.  相似文献   

2.
Obtaining training data for constructing artificial neural networks (ANNs) to identify microbiological taxa is not always easy. Often, only small data sets with different numbers of observations per taxon are available. Here, the effect of both size of the training data set and of an imbalanced number of training patterns for different taxa is investigated using radial basis function ANNs to identify up to 60 species of marine microalgae. The best networks trained to discriminate 20, 40 and 60 species respectively gave overall percentage correct identification of 92, 84 and 77%. From 100 to 200 patterns per species was sufficient in networks trained to discriminate 20, 40 or 60 species. For 40 and 60 species data sets an imbalance in the number of training patterns per species always affected training success, the greater the imbalance the greater the effect. However, this could be largely compensated for by adjusting the networks using a posteriori probabilities, estimated as network output values.  相似文献   

3.
4.

Background  

Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification.  相似文献   

5.
Limited information is available regarding the composition of cellular fatty acids in Armillaria and the extent to which fatty acid profiles can be used to characterize species in this genus. Fatty acid methyl ester (FAME) profiles generated from cultures of A. tabescens, A. mellea, and A. gallica consisted of 16–18 fatty acids ranging from 12–24 carbons in length, although some of these were present only in trace amounts. Across the three species, 9-cis,12-cis-octadecadienoic acid (9,12-C18:2), hexadecanoic acid (16:0), heneicosanoic acid (21:0), 9-cis-octadecenoic acid (9-C18:1), and 2-hydroxy-docosanoic acid (OH-22:0) were the most abundant fatty acids. FAME profiles from different thallus morphologies (mycelium, sclerotial crust, or rhizomorphs) displayed by cultures of A. gallica showed that thallus type had no significant effect on cellular fatty acid composition (P > 0.05), suggesting that FAME profiling is sufficiently robust for species differentiation despite potential differences in thallus morphology within and among species. The three Armillaria species included in this study could be distinguished from other lignicolous basidiomycete species commonly occurring on peach (Schizophyllum commune, Ganoderma lucidum, Stereum hirsutum, and Trametes versicolor) on the basis of FAME profiles using stepwise discriminant analysis (average squared canonical correlation = 0.953), whereby 9-C18:1, 9,12-C18:2, and 10-cis-hexadecenoic acid (10-C16:1) were the three strongest contributors. In a separate stepwise discriminant analysis, A. tabescens, A. mellea, and A. gallica were separated from one another based on their fatty acid profiles (average squared canonical correlation = 0.924), with 11-cis-octadecenoic acid (11-C18:1), 9-C18:1, and 2-hydroxy-hexadecanoic acid (OH-16:0) being most important for species separation. When fatty acids were extracted directly from mycelium dissected from naturally infected host tissue, the FAME-based discriminant functions developed in the preceding experiments classified all samples (n = 16) as A. tabescens; when applied to cultures derived from the same naturally infected samples, all unknowns were similarly classified as A. tabescens. Thus, FAME species classification of Armillaria unknowns directly from infected tissues may be feasible. Species designation of unknown Armillaria cultures by FAME analysis was identical to that indicated by IGS-RFLP classification with AluI.  相似文献   

6.
Identification of bacterial species by profiling fatty acid methyl esters (FAMEs) has commonly been carried out by using a 20-min capillary gas chromatographic procedure followed by library matching of FAME profiles using commercial MIDI databases and proprietary pattern recognition software. Fast GC (5 min) FAME procedures and mass spectrometric methodologies that require no lipid separation have also been reported. In this study, bacterial identification based on the rapid (2 min) infrared measurement of FAME mixtures was demonstrated. The microorganisms investigated included Gram positive bacteria Staphylococcus aureus, Listeria monocytogenes, Bacillus anthracis, and Bacillus cereus, and Gram negative bacteria from the family Enterobacteriacae: Yersinia enterocolitica, Salmonella typhimurium, Shigella sonnei, and Escherichia coli (four strains of E. coli), and non-Enterobacteriacae: Vibrio cholerae, Vibrio vulnificus, and Vibrio parahemolyticus. Foodborne bacterial mixtures of FAMEs were measured by using an attenuated total reflection (ATR)-Fourier transform infrared (FTIR) spectroscopic procedure and discriminated by multivariate analysis. Results showed that the Enterobacteriacae could be discriminated from the vibrios. The identification was at the level of species (for the Bacillus and Vibrio genera) or strains (for the E. coli species). A series of bacterial FAME test samples were prepared and analyzed for accuracy of identification, and all were correctly identified. Our results suggest that this infrared strategy could be used to identify foodborne pathogens.  相似文献   

7.
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F(ST)), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.  相似文献   

8.
This work describes an application of artificial neural networks on a small data set of sesquiterpene lactones (STLs) of three tribes of the family Asteraceae. Structurally different types of representative STLs from seven subtribes of the tribes Eupatorieae, Heliantheae and Vernonieae were selected as input data for self-organizing neural networks. Encoding the 3D molecular structures of STLs and their projection onto Kohonen maps allowed the classification of Asteraceae into tribes and subtribes. This approach allowed the evaluation of structural similarities among different sets of 3D structures of sesquiterpene lactones and their correlation with the current taxonomic classification of the family. Predictions of the occurrence of STLs from a plant species according to the taxa they belong to were also performed by the networks. The methodology used in this work can be applied to chemosystematic or chemotaxonomic studies of Asteraceae.  相似文献   

9.
DNA barcoding as a method for species identification is rapidly increasing in popularity. However, there are still relatively few rigorous methodological tests of DNA barcoding. Current distance-based methods are frequently criticized for treating the nearest neighbor as the closest relative via a raw similarity score, lacking an objective set of criteria to delineate taxa, or for being incongruent with classical character-based taxonomy. Here, we propose an artificial intelligence-based approach - inferring species membership via DNA barcoding with back-propagation neural networks (named BP-based species identification) - as a new advance to the spectrum of available methods. We demonstrate the value of this approach with simulated data sets representing different levels of sequence variation under coalescent simulations with various evolutionary models, as well as with two empirical data sets of COI sequences from East Asian ground beetles (Carabidae) and Costa Rican skipper butterflies. With a 630-to 690-bp fragment of the COI gene, we identified 97.50% of 80 unknown sequences of ground beetles, 95.63%, 96.10%, and 100% of 275, 205, and 9 unknown sequences of the neotropical skipper butterfly to their correct species, respectively. Our simulation studies indicate that the success rates of species identification depend on the divergence of sequences, the length of sequences, and the number of reference sequences. Particularly in cases involving incomplete lineage sorting, this new BP-based method appears to be superior to commonly used methods for DNA-based species identification.  相似文献   

10.
The utility of fatty acid methyl ester (FAME) profiles for characterization and differentiation of isolates of Fusarium oxysporum f. sp. lycopersici and F. oxysporum f. sp. radicis-lycopersici was investigated. Two fatty acid analysis protocols of the normal (MIDI) and a modified MIDI method were used for their utility. Only the modified MIDI method allowed a clear differentiation between F. oxysporum f. sp. lycopersici and F. oxysporum f. sp. radicislycopersici. FAME profiles using the modified MIDI method gave the most consistent and reproducible analyzed fatty acid data. Evaluation of the FAME profiles based on cluster analysis and principal-component analysis revealed that FAME profiles from tested isolates were correlated with the same vegetative compatibility groups (VCGs) compared to the same races in F. oxysporum f. sp. lycopersici. Results indicated that FAME profiles could be an additional tool useful for characterizing isolates and forma species of F. oxysporum obtained from tomato.  相似文献   

11.
New statistical modelling methods, such as neural networks (NNs), allow us to take a step further in the understanding of complex relations in aquatic ecosystems. In this paper the results from the analysis of macro-invertebrate communities in a complex riverine environment are presented. We attempted to explain observed changes in species composition and abundance with neural network modelling methods and compared the results to linear regression. The NN method used is an improved form of the RF5 algorithm, developed to effectively discover numeric laws from data. RF5 uses Product Unit Networks (PUNs), which are in effect multivariate non-discrete power functions. The data set consisted of a 10-year time series of monthly samples of macro-invertebrates on artificial substrates in the rivers Rhine and Meuse in the Netherlands. During this period the invertebrate community has largely changed coinciding with the␣invasion of Ponto-Caspian crustaceans. We used physical–chemical data and data on the abundance of the invasive taxa Corophium curvispinum and Dikerogammarus villosis to explain the observed changes in the resident invertebrate community. The analyses showed temperature, abundance of invasive taxa and peak discharges as important factors. Comparison of the results from NN modelling to linear regression revealed that the factors temperature and abundance of Dikerogammarus villosis explained equally well in both cases. Only the neural network was able to use information on peak discharge and timing of the peak in the previous winter to improve model performances. Neural networks are known to yield excellent modelling results, a drawback however is their lack of transparency or their ‘black box’ character. The use of relatively easy interpretable (white box) PUNs allows us to investigate the extracted relations in more detail and can enhance our understanding of ecosystem functioning. Our results show that peak discharges might be an important factor structuring invertebrate communities in rivers and hint on the existence of interacting effects from invasive species and discharge peaks. They finally show the value of biological data sets that are collected over a long period and in a highly standardised way.  相似文献   

12.
The fatty acid methyl ester composition of a total of 71 marine strains representing the genera Alteromonas, Deleya, Oceanospirillum, and Vibrio was determined by gas-liquid chromatographic analysis. Over 70 different fatty acids were found. The predominant fatty acids were 16:0, 16:1 cis 9, summed-in-feature (SIF) 4 (15:0 iso 2OH and/or 16:1 trans 9) and SIF 7 (18:1 cis 11, 18:1 trans 9, and/or 18:1 trans 6) for all the strains considered, but minor quantitative variations could be used to distinguish the different genera. In addition to a conventional statistical processing method to analyze the data and draw comparison between species and genera, an approach involving neutral network-based elaboration is applied. The statistical analysis and dendrogram representation gave a comparison of the species considered, while the neural network computation provided a more accurate assignment of species to their genera. Moreover, by using neural networks, it was possible to conclude that only 22 fatty acids were important for the identification of the marine genera considered. A database of Alteromonas, Deleya, Oceanospirillum, and Vibrio fatty acid methyl ester profiles was generated and is now routinely used to identify fresh marine isolates.  相似文献   

13.
In this paper we present a method for unbiased/unsupervised classification and identification of closely related fungi, using chemical analysis of secondary metabolite profiles created by HPLC with UV diode array detection. For two chromatographic data matrices a vector of locally aligned full spectral similarities is calculated along the retention time axis. The vector depicts the evaluating of the alikeness between two fungal extracts based upon eluted compounds and corresponding UV-absorbance spectra. For assessment of the chemotaxonomic grouping the vector is condensed to one similarity describing the overall degree of similarity between the profiles. Two sets of data were used in this study: One set was used in the method development and a second dataset used for method validation. First we developed a method for evaluating the secondary metabolite production from closely related Penicillium species. Then the algorithm was validated on fungal isolates belonging to the genus Alternaria. The results showed that the species may be segregated into taxa in full accordance with published taxonomy.  相似文献   

14.
15.
The classification methodology based on morphometric data and supervised artificial neural networks (ANN) was tested on five fly species of the parasitoid genera Tachina and Ectophasia (Diptera, Tachinidae). Objects were initially photographed, then digitalized; consequently the picture was scaled and measured by means of an image analyser. The 16 variables used for classification included length of different wing veins or their parts and width of antennal segments. The sex was found to have some influence on the data and was included in the study as another input variable. Better and reliable classification was obtained when data from both the right and left wings were entered, the data from one wing were however found to be sufficient. The prediction success (correct identification of unknown test samples) varied from 88 to 100% throughout the study depending especially on the number of specimens in the training set. Classification of the studied Diptera species using ANN is possible assuming a sufficiently high number (tens) of specimens of each species is available for the ANN training. The methodology proposed is quite general and can be applied for all biological objects where it is possible to define adequate diagnostic characters and create the appropriate database.  相似文献   

16.
邹应斌  米湘成  石纪成 《生态学报》2004,24(12):2967-2972
研究利用人工神经网络模型 ,以水稻群体分蘖动态为例 ,采用交互验证和独立验证的方式 ,对水稻生长 BP网络模型进行了训练与模拟 ,其结果与水稻群体分蘖的积温统计模型、基本动力学模型和复合分蘖模型进行了比较。研究结果表明 ,神经网络模型具有一定的外推能力 ,但其外推能力依赖于大量的训练样本。神经网络模型具有较好的拟合能力 ,是因为有较多的模型参数 ,因此对神经网络模型的训练需要大量的参数来保证其参数不致过度吻合。具有外推能力神经网络模型的最少训练样本数应大于 6 .75倍于神经网络参数数目 ,小于 13.5倍于神经网络参数数目。因此在应用神经网络模型时 ,如果神经网络模型包括较多的输入变量时 ,可考虑采用主成分分析、对应分析等技术对输入变量进行信息综合 ,相应地减少网络模型的参数。另一方面 ,当训练样本不足时 ,最好只用神经网络模型对同一系统的情况进行模拟 ,应谨慎使用神经网络模型进行外推。神经网络模型给作物模拟研究的科学工作者提供了一个“傻瓜”式工具 ,对数学建模不熟悉的农业研究人员 ,人工神经网络可以替代数学建模进行仿真实验 ;对于精通数学建模的研究人员来说 ,它至少是一种补充和可作为比较的非线性数据处理方法  相似文献   

17.
We describe here the application of a type of artificial neural network, the Gaussian radial basis function (RBF) network, in the identification of a large number of phytoplankton strains from their 11-dimensional flow cytometric characteristics measured by the European Optical Plankton Analyser instrument. The effect of network parameters on optimization is examined. Optimized RBF networks recognized 34 species of marine and freshwater phytoplankton with 91. 5% success overall. The relative importance of each measured parameter in discriminating these data and the behavior of RBF networks in response to data from "novel" species (species not present in the training data) were analyzed.  相似文献   

18.
Simulation of biomass gasification with a hybrid neural network model   总被引:1,自引:0,他引:1  
Gasification of several types of biomass has been conducted in a fluidized bed gasifier at atmospheric pressure with steam as the fluidizing medium. In order to obtain the gasification profiles for each type of biomass, an artificial neural network model has been developed to simulate this gasification processes. Model-predicted gas production rates in this biomass gasification processes were consistent with the experimental data. Therefore, the gasification profiles generated by neural networks are considered to have properly reflected the real gasification process of a biomass. Gasification profiles identified by neural network suggest that gasification behavior of arboreal types of biomass is significantly different from that of herbaceous ones.  相似文献   

19.
We describe here the application of a type of artificial neural network, the Gaussian radial basis function (RBF) network, in the identification of a large number of phytoplankton strains from their 11-dimensional flow cytometric characteristics measured by the European Optical Plankton Analyser instrument. The effect of network parameters on optimization is examined. Optimized RBF networks recognized 34 species of marine and freshwater phytoplankton with 91.5% success overall. The relative importance of each measured parameter in discriminating these data and the behavior of RBF networks in response to data from “novel” species (species not present in the training data) were analyzed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号