共查询到20条相似文献,搜索用时 15 毫秒
1.
Carla Rodríguez Irene Estévez Emilio González-Arnay Juan Campos Angel Lizana 《Journal of biophotonics》2023,16(4):e202200308
Polarimetric data is nowadays used to build recognition models for the characterization of organic tissues or the early detection of some diseases. Different Mueller matrix-derived polarimetric observables, which allow a physical interpretation of a specific characteristic of samples, are proposed in literature to feed the required recognition algorithms. However, they are obtained through mathematical transformations of the Mueller matrix and this process may loss relevant sample information in search of physical interpretation. In this work, we present a thorough comparative between 12 classification models based on different polarimetric datasets to find the ideal polarimetric framework to construct tissues classification models. The study is conducted on the experimental Mueller matrices images measured on different tissues: muscle, tendon, myotendinous junction and bone; from a collection of 165 ex-vivo chicken thighs. Three polarimetric datasets are analyzed: (A) a selection of most representative metrics presented in literature; (B) Mueller matrix elements; and (C) the combination of (A) and (B) sets. Results highlight the importance of using raw Mueller matrix elements for the design of classification models. 相似文献
2.
Butte AJ 《Trends in biotechnology》2001,19(5):159-160
3.
4.
5.
Schematic showing the main categories of models incorporating structured biological data covered in this review. The first panel shows an example of a model operating on sequence data, the second panel shows a model in which dimension reduction is influenced by the connections in a gene network, and the third panel shows a neural network with structure constrained by a phylogeny or ontology. The ‘x’ values in the data tables represent gene expression measurements. 相似文献
6.
Azuaje F 《Comparative and Functional Genomics》2002,3(1):28-31
Research on biological data integration has traditionally focused on the development of systems for the maintenance and interconnection of databases. In the next few years, public and private biotechnology organisations will expand their actions to promote the creation of a post-genome semantic web. It has commonly been accepted that artificial intelligence and data mining techniques may support the interpretation of huge amounts of integrated data. But at the same time, these research disciplines are contributing to the creation of content markup languages and sophisticated programs able to exploit the constraints and preferences of user domains. This paper discusses a number of issues on intelligent systems for the integration of bioinformatic resources. 相似文献
7.
8.
SUMMARY: VizRank is a tool that finds interesting two-dimensional projections of class-labeled data. When applied to multi-dimensional functional genomics datasets, VizRank can systematically find relevant biological patterns. AVAILABILITY: http://www.ailab.si/supp/bi-vizrank SUPPLEMENTARY INFORMATION: http://www.ailab.si/supp/bi-vizrank. 相似文献
9.
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide-ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model's cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification. 相似文献
10.
11.
Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC).We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-validation strategies (CV) for evaluating the ML predictive model performances with not so large datasets.We carried out two classification tasks: histology classification (3 classes) and overall stage classification (two classes: stage I and II). In the first task, the best performance was obtained by a Random Forest classifier, once the analysis has been restricted to stage I and II tumors of the Lung1 and L-RT merged dataset (AUC = 0.72 ± 0.11). For the overall stage classification, the best results were obtained when training on Lung1 and testing of L-RT dataset (AUC = 0.72 ± 0.04 for Random Forest and AUC = 0.84 ± 0.03 for linear-kernel Support Vector Machine).According to the classification task to be accomplished and to the heterogeneity of the available dataset(s), different CV strategies have to be explored and compared to make a robust assessment of the potential of a predictive model based on radiomics and ML. 相似文献
12.
Cheng Q 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2010,7(4):636-646
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results. 相似文献
13.
Sehgal AK Das S Noto K Saier MH Elkan C 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(3):851-857
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases. 相似文献
14.
15.
Advances in the field of targeted proteomics and mass spectrometry have significantly improved assay sensitivity and multiplexing capacity. The high-throughput nature of targeted proteomics experiments has increased the rate of data production, which requires development of novel analytical tools to keep up with data processing demand. Currently, development and validation of targeted mass spectrometry assays require manual inspection of chromatographic peaks from large datasets to ensure quality, a process that is time consuming, prone to inter- and intra-operator variability and limits the efficiency of data interpretation from targeted proteomics analyses. To address this challenge, we have developed TargetedMSQC, an R package that facilitates quality control and verification of chromatographic peaks from targeted proteomics datasets. This tool calculates metrics to quantify several quality aspects of a chromatographic peak, e.g. symmetry, jaggedness and modality, co-elution and shape similarity of monitored transitions in a peak group, as well as the consistency of transitions’ ratios between endogenous analytes and isotopically labeled internal standards and consistency of retention time across multiple runs. The algorithm takes advantage of supervised machine learning to identify peaks with interference or poor chromatography based on a set of peaks that have been annotated by an expert analyst. Using TargetedMSQC to analyze targeted proteomics data reduces the time spent on manual inspection of peaks and improves both speed and accuracy of interference detection. Additionally, by allowing the analysts to customize the tool for application on different datasets, TargetedMSQC gives the users the flexibility to define the acceptable quality for specific datasets. Furthermore, automated and quantitative assessment of peak quality offers a more objective and systematic framework for high throughput analysis of targeted mass spectrometry assay datasets and is a step towards more robust and faster assay implementation. 相似文献
16.
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation.We created a database of 53,345 shark images covering 219 species of sharks, and packaged object-detection and image classification models into a Shark Detector bundle. The Shark Detector recognizes and classifies sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: collecting occurrence records from photographs taken by the public or citizen scientists, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity.The Shark Detector can classify 47 species pertaining to 26 genera. It sorted heterogeneous datasets of images sourced from Instagram with 91% accuracy and classified species with 70% accuracy. It located sharks in baited remote footage and YouTube videos with 89% accuracy, and classified located subjects to the species level with 69% accuracy. All data-generation methods were processed without manual interaction.As media-based remote monitoring appears to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page. 相似文献
17.
Chalup SK 《International journal of neural systems》2002,12(6):447-465
Incremental learning concepts are reviewed in machine learning and neurobiology. They are identified in evolution, neurodevelopment and learning. A timeline of qualitative axon, neuron and synapse development summarizes the review on neurodevelopment. A discussion of experimental results on data incremental learning with recurrent artificial neural networks reveals that incremental learning often seems to be more efficient or powerful than standard learning but can produce unexpected side effects. A characterization of incremental learning is proposed which takes the elaborated biological and machine learning concepts into account. 相似文献
18.
19.
《Neuron》2022,110(23):3866-3881
20.
Diagnosis of triple negative breast cancer using expression data with several machine learning tools
Breast cancer is one of the top three commonly caused cancers worldwide. Triple Negative Breast Cancer (TNBC), a subtype of breast cancer, lacks expression of the oestrogen receptor, progesterone receptor, and HER2. This makes the prognosis poor and early detection hard. Therefore, AI based neural models such as Binary Logistic Regression, Multi-Layer Perceptron and Radial Basis Functions were used for differential diagnosis of normal samples and TNBC samples collected from signal intensity data of microarray experiment. Genes that were significantly upregulated in TNBC were compared with healthy controls. The MLP model classified TNBC and normal cells with anaccuracy of 93.4%. However, RBF gave 74% accuracy and binary Logistic Regression model showed an accuracy of 90.0% in identifying TNBC cases. 相似文献