首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy.  相似文献   

2.

Background  

For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "pn" setting where the number of predictors p by far exceeds the number of observations n, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers.  相似文献   

3.

Background  

Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained.  相似文献   

4.

Background  

Microarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot.  相似文献   

5.

Background  

A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA.  相似文献   

6.

Introduction  

Rheumatoid arthritis (RA) is a complex polygenic disease of unknown etiology. HLA-DRB1 alleles encoding the shared epitope (SE) (RAA amino acid pattern in positions 72 to 74 of the third hypervariable region of the DRβ1 chain) are associated with RA susceptibility. A new classification of HLA-DRB1 SE alleles has been developed by Tezenas du Montcel and colleagues to refine the association between HLA-DRB1 and RA. In the present study, we used RA samples collected worldwide to investigate the relevance of this new HLA-DRB1 classification in terms of RA susceptibility across various Caucasoid and non-Caucasoid patients.  相似文献   

7.

Background  

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golubet al.[1] and the NCI60 dataset of Rosset al.[2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.  相似文献   

8.

Background  

Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important.  相似文献   

9.

Background  

The moss Physcomitrella patens is an emerging model in comparative plant science. At present, the Physcomitrella genome is sequenced at the Joint Genome Institute (USA). In this study we present our results on the development of expressed sequence tag-derived microsatellite markers for Physcomitrella patens, their classification and applicability as genetic markers on the intra- as well as on the interspecies level. We experienced severe restrictions to compare our results on Physcomitrella with earlier studies for other plant species due to varying microsatellite search criteria and a limited selection of analysed species. As a consequence, we performed a side by side analysis of expressed sequence tag-derived microsatellites among 24 plant species covering a broad phylogenetic range and present our results on the observed frequencies.  相似文献   

10.

Background  

The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.  相似文献   

11.

Background  

There is evidence from previous works that bacterial secondary metabolism may be stimulated by genetic manipulation of RNA polymerase (RNAP). In this study we have used rifampicin selection as a strategy to genetically improve the erythromycin producer Saccharopolyspora erythraea.  相似文献   

12.

Background  

SpectraClassifier (SC) is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS)-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward), and feature extraction (PCA). Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC) curves.  相似文献   

13.

Background  

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.  相似文献   

14.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

15.

Background  

Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods.  相似文献   

16.

Background  

Feature selection is an approach to overcome the 'curse of dimensionality' in complex researches like disease classification using microarrays. Statistical methods are utilized more in this domain. Most of them do not fit for a wide range of datasets. The transform oriented signal processing domains are not probed much when other fields like image and video processing utilize them well. Wavelets, one of such techniques, have the potential to be utilized in feature selection method. The aim of this paper is to assess the capability of Haar wavelet power spectrum in the problem of clustering and gene selection based on expression data in the context of disease classification and to propose a method based on Haar wavelet power spectrum.  相似文献   

17.
M Seo  S Oh 《PloS one》2012,7(7):e40419

Background

The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy.

Methodology

In this study, we devised a new feature selection algorithm (CBFS) based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy.

Conclusions/Significance

From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.  相似文献   

18.

Background  

Cernunnos-XLF is a nonhomologous end-joining factor that is mutated in patients with a rare immunodeficiency with microcephaly. Several other microcephaly-associated genes such as ASPM and microcephalin experienced recent adaptive evolution apparently linked to brain size expansion in humans. In this study we investigated whether Cernunnos-XLF experienced similar positive selection during human evolution.  相似文献   

19.

Background  

Protein-coding change is one possible genetic mechanism underlying the evolution of adaptive wing colour pattern variation in Heliconius butterflies. Here we determine whether 38 putative genes within two major Heliconius patterning loci, HmYb and HmB, show evidence of positive selection. Ratios of nonsynonymous to synonymous nucleotide changes (ω) were used to test for selection, as a means of identifying candidate genes within each locus that control wing pattern.  相似文献   

20.

Background  

Genes involved in male reproduction are often the targets of natural and/or sexual selection. SCML1 is a recently identified X-linked gene with preferential expression in testis. To test whether SCML1 is the target of selection in primates, we sequenced and compared the coding region of SCML1 in major primate lineages, and we observed the signature of positive selection in primates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号