期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Feature construction from synergic pairs to improve microarray-based classification

Hanczar B Zucker JD Henegar C Saitta L 《Bioinformatics (Oxford, England)》2007,23(21):2866-2872

相似文献

2.

Small sample results for ratio estimators

RAO PODURI S. R. S.; RAO J. N. K. 《Biometrika》1971,58(3):625-630

相似文献

3.

Effective sample selection for classification of pre-miRNAs

Han K 《Genetics and molecular research : GMR》2011,10(1):506-518

To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences. 相似文献

4.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification 总被引：2，自引：0，他引：2

Alexander Statnikov Lily Wang Constantin F Aliferis

《BMC bioinformatics》

Background

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. 相似文献

5.

Multivariate adaptation and classification method for characteristic sample properties

Steven R. Talbot 《Engineering in Life Science》2011,11(5):463-467

Characteristic properties of samples can be measured by spectrometers, cameras or other applicable equipment. To achieve meaningful classification results with a user‐friendly arrangement of the overall system, a new approach is pursued in which principally unaltered input data are reduced to their essential content using a Wavelet transform and are refined with a special smoothing method in such a manner that certain dimension‐reducing techniques can also be employed in a numerically stable way for discontinuous data sets as they occur, for example, in classification tasks. The introduced multivariate adaptive embedding (MAE) process acts as a universal approximator in the adaptation phase, to a very large extent without iterations and parameter adjustments, and deduces a redundancy‐free model with which untrained input data with outstanding generalization properties, in terms of an application, can be processed in the application phase. While taking advantage of the proximity relationships of the data points, the entire information is mapped into a low‐dimensional coordinate system using a supervised learning process and is scaled and adapted to the respective application using an unsupervised learning process. This approach allows classification of highly related and confused data as they may occur in identification/classification setups for bacteria and other substances using spectroscopic methods. 相似文献

6.

Small sample probability points for the D test of normality

D'AGOSTESTO RALPH B. 《Biometrika》1972,59(1):219-221

相似文献

7.

"Small sample asymptotics for type I censored data"

Wu J 《Biometrical journal. Biometrische Zeitschrift》2005,47(5):763; author reply 764-763; author reply 765

相似文献

8.

Simultaneous gene clustering and subset selection for sample classification via MDL

Jörnsten R Yu B 《Bioinformatics (Oxford, England)》2003,19(9):1100-1109

相似文献

9.

A consensus multi-view multi-objective gene selection approach for improved sample classification

Acharya Sudipta Cui Laizhong Pan Yi 《BMC bioinformatics》2020,21(13):1-15

Background

High-dimensional flow cytometry and mass cytometry allow systemic-level characterization of more than 10 protein profiles at single-cell resolution and provide a much broader landscape in many biological applications, such as disease diagnosis and prediction of clinical outcome. When associating clinical information with cytometry data, traditional approaches require two distinct steps for identification of cell populations and statistical test to determine whether the difference between two population proportions is significant. These two-step approaches can lead to information loss and analysis bias.

Results

We propose a novel statistical framework, called LAMBDA (Latent Allocation Model with Bayesian Data Analysis), for simultaneous identification of unknown cell populations and discovery of associations between these populations and clinical information. LAMBDA uses specified probabilistic models designed for modeling the different distribution information for flow or mass cytometry data, respectively. We use a zero-inflated distribution for the mass cytometry data based the characteristics of the data. A simulation study confirms the usefulness of this model by evaluating the accuracy of the estimated parameters. We also demonstrate that LAMBDA can identify associations between cell populations and their clinical outcomes by analyzing real data. LAMBDA is implemented in R and is available from GitHub (https://github.com/abikoushi/lambda).

相似文献

10.

Improved analytical methods for microarray-based genome-composition analysis

下载免费PDF全文

Kim CC Joyce EA Chan K Falkow S 《Genome biology》2002,3(11):research0065.1-research006517

Background

Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes. 相似文献

11.

Small sample properties of the Mantel-Haenszel test

LI SHOU-HUA; SIMON RICHARD M.; GART JOHN J. 《Biometrika》1979,66(1):181-183

相似文献

12.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data 总被引：3，自引：0，他引：3

Xuegong Zhang Xin Lu Qian Shi Xiu-qin Xu Hon-chiu E Leung Lyndsay N Harris James D Iglehart Alexander Miron Jun S Liu Wing H Wong 《BMC bioinformatics》2006,7(1):197-13

Background

Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. 相似文献

13.

A fisheye viewer for microarray-based gene expression data

Min Wu Cheng Thao Xiangming Mu Ethan V Munson 《BMC bioinformatics》2006,7(1):452-7

相似文献

14.

Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays 总被引：1，自引：0，他引：1

Tembe W Zavaljevski N Bode E Chase C Geyer J Wasieloski L Benson G Reifman J 《Bioinformatics (Oxford, England)》2007,23(1):5-13

MOTIVATION: Advances in DNA microarray technology and computational methods have unlocked new opportunities to identify 'DNA fingerprints', i.e. oligonucleotide sequences that uniquely identify a specific genome. We present an integrated approach for the computational identification of DNA fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable definition of a DNA fingerprint stated both from a computational as well as an experimental point of view, and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our approach. RESULTS: The presented computational approach is implemented in an integrated high-performance computing (HPC) software tool for oligonucleotide fingerprint identification termed TOFI. We employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for Francisella tularensis are presented. AVAILABILITY: The implemented algorithm is available upon request. 相似文献

15.

Resource and hardware options for microarray-based experimentation.

Nabeel A Affara 《Briefings in Functional Genomics and Prot》2003,2(1):7-20

DNA microarray technology permits the study of biological systems and processes on a genome-wide scale. Arrays based on cDNA clones, oligonucleotides and genomic clones have been developed for investigations of gene expression, genetic analysis and genomic changes associated with disease. Over the past 3-4 years, microarrays have become more widely available to the research community. This has occurred through increased commercial availability of custom and generic arrays and the development of robotic equipment that has enabled array printing and analysis facilities to be established in academic research institutions. This brief review examines the public and commercial resources, the microarray fabrication and data capture and analysis equipment currently available to the user. 相似文献

16.

Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data

Yunsong Qi Xibei Yang 《Genomics》2013,101(1):38-48

An important application of gene expression data is to classify samples in a variety of diagnostic fields. However, high dimensionality and a small number of noisy samples pose significant challenges to existing classification methods. Focused on the problems of overfitting and sensitivity to noise of the dataset in the classification of microarray data, we propose an interval-valued analysis method based on a rough set technique to select discriminative genes and to use these genes to classify tissue samples of microarray data. We first select a small subset of genes based on interval-valued rough set by considering the preference-ordered domains of the gene expression data, and then classify test samples into certain classes with a term of similar degree. Experiments show that the proposed method is able to reach high prediction accuracies with a small number of selected genes and its performance is robust to noise. 相似文献

17.

Novel microarray-based method for estimating exposure to ionizing radiation 总被引：1，自引：0，他引：1

Gruel G Lucchesi C Pawlik A Frouin V Alibert O Kortulewski T Zarour A Jacquelin B Gidrol X Tronik-Le Roux D 《Radiation research》2006,166(5):746-756

相似文献

18.

Nucleic acid amplification strategies for DNA microarray-based pathogen detection 总被引：10，自引：0，他引：10

Vora GJ Meador CE Stenger DA Andreadis JD 《Applied and environmental microbiology》2004,70(5):3047-3054

DNA microarray-based screening and diagnostic technologies have long promised comprehensive testing capabilities. However, the potential of these powerful tools has been limited by front-end target-specific nucleic acid amplification. Despite the sensitivity and specificity associated with PCR amplification, the inherent bias and limited throughput of this approach constrain the principal benefits of downstream microarray-based applications, especially for pathogen detection. To begin addressing alternative approaches, we investigated four front-end amplification strategies: random primed, isothermal Klenow fragment-based, phi29 DNA polymerase-based, and multiplex PCR. The utility of each amplification strategy was assessed by hybridizing amplicons to microarrays consisting of 70-mer oligonucleotide probes specific for enterohemorrhagic Escherichia coli O157:H7 and by quantitating their sensitivities for the detection of O157:H7 in laboratory and environmental samples. Although nearly identical levels of hybridization specificity were achieved for each method, multiplex PCR was at least 3 orders of magnitude more sensitive than any individual random amplification approach. However, the use of Klenow-plus-Klenow and phi29 polymerase-plus-Klenow tandem random amplification strategies provided better sensitivities than multiplex PCR. In addition, amplification biases among the five genetic loci tested were 2- to 20-fold for the random approaches, in contrast to >4 orders of magnitude for multiplex PCR. The same random amplification strategies were also able to detect all five diagnostic targets in a spiked environmental water sample that contained a 63-fold excess of contaminating DNA. The results presented here underscore the feasibility of using random amplification approaches and begin to systematically address the versatility of these approaches for unbiased pathogen detection from environmental sources. 相似文献

19.

Merging microfluidics with microarray-based bioassays 总被引：1，自引：0，他引：1

Situma C Hashimoto M Soper SA 《Biomolecular engineering》2006,23(5):213-231

Microarray technologies provide powerful tools for biomedical researchers and medicine, since arrays can be configured to monitor the presence of molecular signatures in a highly parallel fashion and can be configured to search either for nucleic acids (DNA microarrays) or proteins (antibody-based microarrays) as well as different types of cells. Microfluidics on the other hand, provides the ability to analyze small volumes (micro-, nano- or even pico-liters) of sample and minimize costly reagent consumption as well as automate sample preparation and reduce sample processing time. The marriage of microarray technologies with the emerging field of microfluidics provides a number of advantages such as, reduction in reagent cost, reductions in hybridization assay times, high-throughput sample processing, and integration and automation capabilities of the front-end sample processing steps. However, this potential marriage is also fraught with some challenges as well, such as developing low-cost manufacturing methods of the fluidic chips, providing good interfaces to the macro-world, minimizing non-specific analyte/wall interactions due to the high surface-to-volume ratio associated with microfluidics, the development of materials that accommodate the optical readout phases of the assay and complete integration of peripheral components (optical and electrical) to the microfluidic to produce autonomous systems appropriate for point-of-care testing. In this review, we provide an overview and recent advances on the coupling of DNA, protein and cell microarrays to microfluidics and discuss potential improvements required for the implementation of these technologies into biomedical and clinical applications. 相似文献

20.

Optimal number of features as a function of sample size for various classification rules

Hua J Xiong Z Lowey J Suh E Dougherty ER 《Bioinformatics (Oxford, England)》2005,21(8):1509-1515

MOTIVATION: Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. RESULTS: Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. AVAILABILITY: For the companion website, please visit http://public.tgen.org/tamu/ofs/ CONTACT: e-dougherty@ee.tamu.edu. 相似文献