首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
A novel approach of phenotype analysis of fermentation‐based bioprocesses based on unsupervised learning (clustering) is presented. As a prior identification of phenotypes and conditional interrelations is desired to control fermentation performance, an automated learning method to output reference phenotypes (defined as vector of biomass‐specific rates) was developed and the necessary computing process and parameters were assessed. For its demonstration, time series data of 90 Clostridium pasteurianum cultivations were used which feature a broad spectrum of solventogenic and acidogenic phenotypes, while 14 clusters of phenotypic manifestations were identified. The analysis of reference phenotypes showed distinct differences, where potential conditionalities were exemplary isolated. Further, cluster‐based balancing of carbon and ATP or the use of reference phenotypes as indicator for bioprocess monitoring were demonstrated to highlight the perks of this approach. Overall, such analysis depends strongly on the quality of the data and experimental validations will be required before conclusions. However, the automated, streamlined and abstracted approach diminishes the need of individual evaluation of all noisy dataset and showed promising results, which could be transferred to strains with comparably wide‐ranging phenotypic manifestations or as indicators for repeated bioprocesses with clearly defined target.  相似文献   

Increasingly, animal behavior studies are enhanced through the use of accelerometry. To allow translation of raw accelerometer data to animal behaviors requires the development of classifiers. Here, we present the “rabc” (r for animal behavior classification) package to assist researchers with the interactive development of such animal behavior classifiers in a supervised classification approach. The package uses datasets consisting of accelerometer data with their corresponding animal behaviors (e.g., for triaxial accelerometer data along the x, y and z axes arranged as “x, y, z, x, y, z,…, behavior”). Using an example dataset collected on white stork (Ciconia ciconia), we illustrate the workflow of this package, including accelerometer data visualization, feature calculation, feature selection, feature visualization, extreme gradient boost model training, validation, and, finally, a demonstration of the behavior classification results.  相似文献   

Brillouin imaging relies on the reliable extraction of subtle spectral information from hyperspectral datasets. To date, the mainstream practice has been to use line fitting of spectral features to retrieve the average peak shift and linewidth parameters. Good results, however, depend heavily on sufficient signal-to-noise ratio and may not be applicable in complex samples that consist of spectral mixtures. In this work, we thus propose the use of various multivariate algorithms that can be used to perform supervised or unsupervised analysis of the hyperspectral data, with which we explore advanced image analysis applications, namely unmixing, classification and segmentation in a phantom and live cells. The resulting images are shown to provide more contrast and detail, and obtained on a timescale ∼102 faster than fitting. The estimated spectral parameters are consistent with those calculated from pure fitting.  相似文献   

Objectives: This paper presents a novel wearable system for in-home and long-term fetal movement monitoring on a reliable and easily accessible basis.Material and methods: The system mainly consists of four accelerometers for fetal movement signal acquisition, a microcontroller for signal processing and an Android-based device interacting with the microcontroller via Bluetooth Low Energy (BLE), providing the mother with information related to the fetal movement in an intelligible way.Results: The proposed system can deliver reliable results with a specificity of 0.99 and a sensitivity of 0.77 for fetal movement time series signal classification.Conclusion: The proposed wearable system will provide a good alternative to optimize the use of medical professionals and hospital resources, and has potential applications in the field of e-Health home care. Besides, the fetal movement acceleration signals acquired with volunteers (pregnant women) help establish an initial database for future medical analysis of sensor-recorded fetal behaviors.  相似文献   

  1. Recent advances in digital data collection have spurred accumulation of immense quantities of data that have potential to lead to remarkable ecological insight, but that also present analytic challenges. In the case of biologging data from birds, common analytical approaches to classifying movement behaviors are largely inappropriate for these massive data sets.
  2. We apply a framework for using K‐means clustering to classify bird behavior using points from short time interval GPS tracks. K‐means clustering is a well‐known and computationally efficient statistical tool that has been used in animal movement studies primarily for clustering segments of consecutive points. To illustrate the utility of our approach, we apply K‐means clustering to six focal variables derived from GPS data collected at 1–11 s intervals from free‐flying bald eagles (Haliaeetus leucocephalus) throughout the state of Iowa, USA. We illustrate how these data can be used to identify behaviors and life‐stage‐ and age‐related variation in behavior.
  3. After filtering for data quality, the K‐means algorithm identified four clusters in >2 million GPS telemetry data points. These four clusters corresponded to three movement states: ascending, flapping, and gliding flight; and one non‐moving state: perching. Mapping these states illustrated how they corresponded tightly to expectations derived from natural history observations; for example, long periods of ascending flight were often followed by long gliding descents, birds alternated between flapping and gliding flight.
  4. The K‐means clustering approach we applied is both an efficient and effective mechanism to classify and interpret short‐interval biologging data to understand movement behaviors. Furthermore, because it can apply to an abundance of very short, irregular, and high‐dimensional movement data, it provides insight into small‐scale variation in behavior that would not be possible with many other analytical approaches.

When modeling longitudinal biomedical data, often dimensionality reduction as well as dynamic modeling in the resulting latent representation is needed. This can be achieved by artificial neural networks for dimension reduction and differential equations for dynamic modeling of individual-level trajectories. However, such approaches so far assume that parameters of individual-level dynamics are constant throughout the observation period. Motivated by an application from psychological resilience research, we propose an extension where different sets of differential equation parameters are allowed for observation subperiods. Still, estimation for intra-individual subperiods is coupled for being able to fit the model also with a relatively small dataset. We subsequently derive prediction targets from individual dynamic models of resilience in the application. These serve as outcomes for predicting resilience from characteristics of individuals, measured at baseline and a follow-up time point, and selecting a small set of important predictors. Our approach is seen to successfully identify individual-level parameters of dynamic models that allow to stably select predictors, that is, resilience factors. Furthermore, we can identify those characteristics of individuals that are the most promising for updates at follow-up, which might inform future study design. This underlines the usefulness of our proposed deep dynamic modeling approach with changes in parameters between observation subperiods.  相似文献   

Giribet, G. 2010. A new dimension in combining data? The use of morphology and phylogenomic data in metazoan systematics. —Acta Zoologica (Stockholm) 91 : 11–19 Animal phylogenies have been traditionally inferred by using the character state information derived from the observation of a diverse array of morphological and anatomical features, but the incorporation of molecular data into the toolkit of phylogenetic characters has shifted drastically the way researchers infer phylogenies. A main reason for this is the ease at which molecular data can be obtained, compared to, e.g., traditional histological and microscopical techniques. Researchers now routinely use genomic data for reconstructing relationships among animal phyla (using whole genomes or Expressed Sequence Tags) but the amount of morphological data available to study the same phylogenetic patterns has not grown accordingly. Given the disparity between the amounts of molecular and morphological data, some authors have questioned entire morphological programs. In this review I discuss issues related to the combinability of genomic and morphological data, the informativeness of each set of characters, and conclude with a discussion of how morphology could be made scalable by utilizing new techniques that allow for non‐intrusive examination of large amounts of preserved museum specimens. Morphology should therefore remains a strong field in evolutionary and comparative biology, as it continues to provide information for inferring phylogenetic patterns, is an important complement for the patterns derived from the molecular data, and it is the common nexus that allows studying fossil taxa with large data sets of molecular data.  相似文献   

Qiao X  Liu Y 《Biometrics》2009,65(1):159-168
Summary .  In multicategory classification, standard techniques typically treat all classes equally. This treatment can be problematic when the dataset is unbalanced in the sense that certain classes have very small class proportions compared to others. The minority classes may be ignored or discounted during the classification process due to their small proportions. This can be a serious problem if those minority classes are important. In this article, we study the problem of unbalanced classification and propose new criteria to measure classification accuracy. Moreover, we propose three different weighted learning procedures, two one-step weighted procedures, as well as one adaptive weighted procedure. We demonstrate the advantages of the new procedures, using multicategory support vector machines, through simulated and real datasets. Our results indicate that the proposed methodology can handle unbalanced classification problems effectively.  相似文献   

As a powerful diagnostic tool, optical coherence tomography (OCT) has been widely used in various clinical setting. However, OCT images are susceptible to inherent speckle noise that may contaminate subtle structure information, due to low-coherence interferometric imaging procedure. Many supervised learning-based models have achieved impressive performance in reducing speckle noise of OCT images trained with a large number of noisy-clean paired OCT images, which are not commonly feasible in clinical practice. In this article, we conducted a comparative study to investigate the denoising performance of OCT images over different deep neural networks through an unsupervised Noise2Noise (N2N) strategy, which only trained with noisy OCT samples. Four representative network architectures including U-shaped model, multi-information stream model, straight-information stream model and GAN-based model were investigated on an OCT image dataset acquired from healthy human eyes. The results demonstrated all four unsupervised N2N models offered denoised OCT images with a performance comparable with that of supervised learning models, illustrating the effectiveness of unsupervised N2N models in denoising OCT images. Furthermore, U-shaped models and GAN-based models using UNet network as generator are two preferred and suitable architectures for reducing speckle noise of OCT images and preserving fine structure information of retinal layers under unsupervised N2N circumstances.  相似文献   

Accelerometers are increasingly used tools for gait analysis, but there remains a lack of research on their application to running and their ability to classify running patterns. The purpose of this study was to conduct an exploratory examination into the capability of a tri-axial accelerometer to classify runners of different training backgrounds and experience levels, according to their 3-dimensional (3D) accelerometer data patterns. Training background was examined with 14 competitive soccer players and 12 experienced marathon runners, and experience level was examined with 16 first-time and the same 12 experienced marathon runners. Discrete variables were extracted from 3D accelerations during a short run using root mean square, wavelet transformation, and autocorrelation procedures. A principal component analysis (PCA) was conducted on all variables, including gait speed to account for covariance. Eight PCs were retained, explaining 88% of the variance in the data. A stepwise discriminant analysis of PCs was used to determine the binary classification accuracy for training background and experience level, with and without the PC of Speed. With Speed, the accelerometer correctly classified 96% of runners for both training background and experience level. Without Speed, the accelerometer correctly classified 85% of runners based on training background, but only 68% based on experience level. These findings suggest that the accelerometer is effective in classifying athletes of different training backgrounds, but is less effective for classifying runners of different experience levels where gait speed is the primary discriminator.  相似文献   

树种多样性是生态学研究的重要内容,树木的种类和空间分布信息可有效服务于可持续森林管理。但在复杂林分条件下,获取高精度分类结果的难度大。而无人机遥感可获取局域超精细数据,为树种分类精度的提高提供了可能。基于可见光、高光谱、激光雷达等多源无人机遥感数据,探究其在亚热带林分条件下的树种分类潜力。研究发现:(1)随机森林分类器总体精度和各树种的F1分数最高,适合亚热带多树种的分类制图,其区分13种类别(8乔木,4草本)的总体精度为95.63%,Kappa系数为0.948;(2)多源数据的使用可以显著提高分类精度,全特征模型精度最高,且高光谱和激光雷达数据显著影响全特征模型分类精度,可见光纹理数据作用较小;(3)分类特征重要性从大到小排序为结构信息,植被指数,纹理信息,最小噪声变换分量。  相似文献   

Species are considered to be the basic unit of ecological and evolutionary studies. As multilocus genomic data are increasingly available, there have been considerable interests in the use of DNA sequence data to delimit species. In this study, we show that machine learning can be used for species delimitation. Our method treats the species delimitation problem as a classification problem for identifying the category of a new observation on the basis of training data. Extensive simulation is first conducted over a broad range of evolutionary parameters for training purposes. Each pair of known populations is combined to form training samples with a label of “same species” or “different species”. We use support vector machine (SVM) to train a classifier using a set of summary statistics computed from training samples as features. The trained classifier can classify a test sample to two outcomes: “same species” or “different species”. Given multilocus genomic data of multiple related organisms or populations, our method (called CLADES) performs species delimitation by first classifying pairs of populations. CLADES then delimits species by maximizing the likelihood of species assignment for multiple populations. CLADES is evaluated through extensive simulation and also tested on real genetic data. We show that CLADES is both accurate and efficient for species delimitation when compared with existing methods. CLADES can be useful especially when existing methods have difficulty in delimitation, for example with short species divergence time and gene flow.  相似文献   

We assessed the effects of repeated extinction and reversals of two conditional stimuli (CS+/CS−) on an appetitive conditioned approach response in rats. Three results were observed that could not be accounted for by a simple linear operator model such as the one proposed by Rescorla and Wagner (1972): (1) responding to a CS− declined faster when a CS+ was simultaneously extinguished; (2) reacquisition of pre-extinction performance recovered rapidly within one session; and (3) reversal of CS+/CS− contingencies resulted in a more rapid recovery to the current CS− (former CS+) than the current CS+, accompanied by a slower acquisition of performance to the current CS+. An arousal parameter that mediates learning was introduced to a linear operator model to account for these effects. The arousal-mediated learning model adequately fit the data and predicted data from a second experiment with different rats in which only repeated reversals of CS+/CS− were assessed. According to this arousal-mediated learning model, learning is accelerated by US-elicited arousal and it slows down in the absence of US. Because arousal varies faster than conditioning, the model accounts for the decline in responding during extinction mainly through a reduction in arousal, not a change in learning. By preserving learning during extinction, the model is able to account for relapse effects like rapid reacquisition, renewal, and reinstatement.  相似文献   

This paper describes an iterative learning control scheme for fed-batch operation where repetitive trajectory tracking tasks are required. The proposed learning strategy is model-independent, and it takes advantage of the repetitive feature of system operations with a certain degree of intelligence and requires only small size of dynamic database for the learning process. The convergence of the learning process is proven. An example of simultaneously tracking two predefined trajectories by iterative learning control with two control inputs is given to illustrate the methodology. Satisfactory performance of the learning system can be observed from the simulation results.  相似文献   

Ecologists collect their data manually by visiting multiple sampling sites. Since there can be multiple species in the multiple sampling sites, manually classifying them can be a daunting task. Much work in literature has focused mostly on statistical methods for classification of single species and very few studies on classification of multiple species. In addition to looking at multiple species, we noted that classification of multiple species result in multi-class imbalanced problem. This study proposes to use machine learning approach to classify multiple species in population ecology. In particular, bagging (random forests (RF) and bagging classification trees (bagCART)) and boosting (boosting classification trees (bootCART), gradient boosting machines (GBM) and adaptive boosting classification trees (AdaBoost)) classifiers were evaluated for their performances on imbalanced multiple fish species dataset. The recall and F1-score performance metrics were used to select the best classifier for the dataset. The bagging classifiers (RF and bagCART) achieved high performances on the imbalanced dataset while the boosting classifiers (bootCART, GBM and AdaBoost) achieved lower performances on the imbalanced dataset. We found that some machine learning classifiers were sensitive to imbalanced dataset hence they require data resampling to improve their performances. After resampling, the bagging classifiers (RF and bagCART) had high performances compared to boosting classifiers (bootCART, GBM and AdaBoost). The strong performances shown by bagging classifiers (RF and bagCART) suggest that they can be used for classifying multiple species in ecological studies.  相似文献   

Protein–protein interactions play a key role in many biological systems. High‐throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false‐positive and false‐negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co‐complex relationship, and (3) pathway co‐membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity‐based k‐Nearest‐Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co‐complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top‐ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast‐2‐hybrid system were not among the top‐ranking features under any condition. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

Enzymes are critical proteins in every organism. They speed up essential chemical reactions, help fight diseases, and have a wide use in the pharmaceutical and manufacturing industries. Wet lab experiments to figure out an enzyme''s function are time consuming and expensive. Therefore, the need for computational approaches to address this problem are becoming necessary. Usually, an enzyme is extremely specific in performing its function. However, there exist enzymes that can perform multiple functions. A multi‐functional enzyme has vast potential as it reduces the need to discover/use different enzymes for different functions. We propose an approach to predict a multi‐functional enzyme''s function up to the most specific fourth level of the hierarchy of the Enzyme Commission (EC) number. Previous studies can only predict the function of the enzyme till level 1. Using a dataset of 2,583 multi‐functional enzymes, we achieved a hierarchical subset accuracy of 71.4% and a Macro F1 Score of 96.1% at the fourth level. The robustness of the network was further tested on a multi‐functional isoforms dataset. Our method is broadly applicable and may be used to discover better enzymes. The web‐server can be freely accessed at http://hecnet.cbrlab.org/.  相似文献   

The present study proposed a deep learning (DL) algorithm to predict survival in patients with colon adenocarcinoma (COAD) based on multiomics integration. The survival-sensitive model was constructed using an autoencoder for DL implementation based on The Cancer Genome Atlas (TCGA) data of patients with COAD. The autoencoder framework was compared with PCA, NMF, t-SNE, and univariable Cox-PH model for identifying survival-related features. The prognostic robustness of the inferred survival risk groups was validated using three independent confirmation cohorts. Differential expression analysis, Pearson’s correlation analysis, construction of miRNA–target gene network, and function enrichment analysis were performed. Two risk groups with significant survival differences were identified in TCGA set using the autoencoder-based model (log-rank P-value = 5.51e−07). The autoencoder framework showed superior performance compared with PCA, NMF, t-SNE, and the univariable Cox-PH model based on the C-index, log-rank P-value, and Brier score. The robustness of the classification model was successfully verified in three independent validation sets. There were 1271 differentially expressed genes, 10 differentially expressed miRNAs, and 12 hypermethylated genes between the survival risk groups. Among these, miR-133b and its target genes (GNB4, PTPRZ1, RUNX1T1, EPHA7, GPM6A, BICC1, and ADAMTS5) were used to construct a network. These genes were significantly enriched in ECM–receptor interaction, focal adhesion, PI3K–Akt signaling pathway, and glucose metabolism-related pathways. The risk subgroups obtained through a multiomics data integration pipeline using the DL algorithm had good robustness. miR-133b and its target genes could be potential diagnostic markers. The results would assist in elucidating the possible pathogenesis of COAD.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号