首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.

Background

Images of frozen hydrated [vitrified] virus particles were taken close-to-focus in an electron microscope containing structural signals at high spatial frequencies. These images had very low contrast due to the high levels of noise present in the image. The low contrast made particle selection, classification and orientation determination very difficult. The final purpose of the classification is to improve the signal-to-noise ratio of the particle representing the class, which is usually the average. In this paper, the proposed method is based on wavelet filtering and multi-resolution processing for the classification and reconstruction of this very noisy data. A multivariate statistical analysis (MSA) is used for this classification.

Results

The MSA classification method is noise dependant. A set of 2600 projections from a 3D map of a herpes simplex virus -to which noise was added- was classified by MSA. The classification shows the power of wavelet filtering in enhancing the quality of class averages (used in 3D reconstruction) compared to Fourier band pass filtering. A 3D reconstruction of a recombinant virus (VP5-VP19C) is presented as an application of multi-resolution processing for classification and reconstruction.

Conclusion

The wavelet filtering and multi-resolution processing method proposed in this paper offers a new way for processing very noisy images obtained from electron cryo-microscopes. The multi-resolution and filtering improves the speed and accuracy of classification, which is vital for the 3D reconstruction of biological objects. The VP5-VP19C recombinant virus reconstruction presented here is an example, which demonstrates the power of this method. Without this processing, it is not possible to get the correct 3D map of this virus.
  相似文献   

2.

Background  

A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA.  相似文献   

3.

Background  

With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy.  相似文献   

4.

Background  

State-of-the-art signal processing methods are known to detect information in single-trial event-related EEG data, a crucial aspect in development of real-time applications such as brain computer interfaces. This paper investigates one such novel approach, evaluating how individual classifier and feature subset tailoring affects classification of single-trial EEG finger movements. The discrete wavelet transform was used to extract signal features that were classified using linear regression and non-linear neural network models, which were trained and architecturally optimized with evolutionary algorithms. The input feature subsets were also allowed to evolve, thus performing feature selection in a wrapper fashion. Filter approaches were implemented as well by limiting the degree of optimization.  相似文献   

5.

Background  

Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category ofallsamples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample.  相似文献   

6.

Background  

The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.  相似文献   

7.
M Seo  S Oh 《PloS one》2012,7(7):e40419

Background

The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy.

Methodology

In this study, we devised a new feature selection algorithm (CBFS) based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy.

Conclusions/Significance

From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.  相似文献   

8.

Background  

The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset.  相似文献   

9.

Motivation

DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes.

Results

Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods.  相似文献   

10.

Background

The goal of this work is to develop a non-invasive method in order to help detecting Alzheimer's disease in its early stages, by implementing voice analysis techniques based on machine learning algorithms.

Methods

We extract temporal and acoustical voice features (e.g. Jitter and Harmonics-to-Noise Ratio) from read speech of patients in Early Stage of Alzheimer's Disease (ES-AD), with Mild Cognitive Impairment (MCI), and from a Healthy Control (HC) group. Three classification methods are used to evaluate the efficiency of these features, namely kNN, SVM and decision Tree. To assess the effectiveness of this set of features, we compare them with two sets of feature parameters that are widely used in speech and speaker recognition applications. A two-stage feature selection process is conducted to optimize classification performance. For these experiments, the data samples of HC, ES-AD and MCI groups were collected at AP-HP Broca Hospital, in Paris.

Results

First, a wrapper feature selection method for each feature set is evaluated and the relevant features for each classifier are selected. By combining, for each classifier, the features selected from each initial set, we improve the classification accuracy by a relative gain of more than 30% for all classifiers. Then the same feature selection procedure is performed anew on the combination of selected feature sets, resulting in an additional significant improvement of classification accuracy.

Conclusion

The proposed method improved the classification accuracy for ES-AD, MCI and HC groups and promises the effectiveness of speech analysis and machine learning techniques to help detect pathological diseases.  相似文献   

11.

Background  

The classification of protein domains in the CATH resource is primarily based on structural comparisons, sequence similarity and manual analysis. One of the main bottlenecks in the processing of new entries is the evaluation of 'borderline' cases by human curators with reference to the literature, and better tools for helping both expert and non-expert users quickly identify relevant functional information from text are urgently needed. A text based method for protein classification is presented, which complements the existing sequence and structure-based approaches, especially in cases exhibiting low similarity to existing members and requiring manual intervention. The method is based on the assumption that textual similarity between sets of documents relating to proteins reflects biological function similarities and can be exploited to make classification decisions.  相似文献   

12.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

13.
14.

Background  

The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry.  相似文献   

15.

Background  

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.  相似文献   

16.

Background  

The number of sequences compiled in many genome projects is growing exponentially, but most of them have not been characterized experimentally. An automatic annotation scheme must be in an urgent need to reduce the gap between the amount of new sequences produced and reliable functional annotation. This work proposes rules for automatically classifying the fungus genes. The approach involves elucidating the enzyme classifying rule that is hidden in UniProt protein knowledgebase and then applying it for classification. The association algorithm, Apriori, is utilized to mine the relationship between the enzyme class and significant InterPro entries. The candidate rules are evaluated for their classificatory capacity.  相似文献   

17.

Background  

Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, Coiflets, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks.  相似文献   

18.

Background  

Microarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot.  相似文献   

19.

Background  

Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal.  相似文献   

20.

Background  

Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号