首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.

Background  

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.  相似文献   

4.
To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences.  相似文献   

5.
Characteristic properties of samples can be measured by spectrometers, cameras or other applicable equipment. To achieve meaningful classification results with a user‐friendly arrangement of the overall system, a new approach is pursued in which principally unaltered input data are reduced to their essential content using a Wavelet transform and are refined with a special smoothing method in such a manner that certain dimension‐reducing techniques can also be employed in a numerically stable way for discontinuous data sets as they occur, for example, in classification tasks. The introduced multivariate adaptive embedding (MAE) process acts as a universal approximator in the adaptation phase, to a very large extent without iterations and parameter adjustments, and deduces a redundancy‐free model with which untrained input data with outstanding generalization properties, in terms of an application, can be processed in the application phase. While taking advantage of the proximity relationships of the data points, the entire information is mapped into a low‐dimensional coordinate system using a supervised learning process and is scaled and adapted to the respective application using an unsupervised learning process. This approach allows classification of highly related and confused data as they may occur in identification/classification setups for bacteria and other substances using spectroscopic methods.  相似文献   

6.
7.
8.
Kim CC  Joyce EA  Chan K  Falkow S 《Genome biology》2002,3(11):research0065.1-research006517

Background  

Whereas genome sequencing has given us high-resolution pictures of many different species of bacteria, microarrays provide a means of obtaining information on genome composition for many strains of a given species. Genome-composition analysis using microarrays, or 'genomotyping', can be used to categorize genes into 'present' and 'divergent' categories based on the level of hybridization signal. This typically involves selecting a signal value that is used as a cutoff to discriminate present (high signal) and divergent (low signal) genes. Current methodology uses empirical determination of cutoffs for classification into these categories, but this methodology is subject to several problems that can result in the misclassification of many genes.  相似文献   

9.
10.
Acharya  Sudipta  Cui  Laizhong  Pan  Yi 《BMC bioinformatics》2020,21(13):1-15
Background

High-dimensional flow cytometry and mass cytometry allow systemic-level characterization of more than 10 protein profiles at single-cell resolution and provide a much broader landscape in many biological applications, such as disease diagnosis and prediction of clinical outcome. When associating clinical information with cytometry data, traditional approaches require two distinct steps for identification of cell populations and statistical test to determine whether the difference between two population proportions is significant. These two-step approaches can lead to information loss and analysis bias.

Results

We propose a novel statistical framework, called LAMBDA (Latent Allocation Model with Bayesian Data Analysis), for simultaneous identification of unknown cell populations and discovery of associations between these populations and clinical information. LAMBDA uses specified probabilistic models designed for modeling the different distribution information for flow or mass cytometry data, respectively. We use a zero-inflated distribution for the mass cytometry data based the characteristics of the data. A simulation study confirms the usefulness of this model by evaluating the accuracy of the estimated parameters. We also demonstrate that LAMBDA can identify associations between cell populations and their clinical outcomes by analyzing real data. LAMBDA is implemented in R and is available from GitHub (https://github.com/abikoushi/lambda).

  相似文献   

11.
Occupancy models may be used to estimate the probability that a randomly selected site in an area of interest is occupied by a species (ψ), given imperfect detection (p). This method can be extended, given multiple survey periods, to permit the estimation of seasonal probabilities of ψ, colonization (γ), persistence (φ), and extinction (1 − φ) in season t. We evaluated the sampling properties of estimators of these parameters using simulated data across a range of the parameters, differing levels of sites and visits, with a published dynamic occupancy model (Royle and Kery 2007). Bias depended largely on p and the number of visits, but also on the number of sites, ψt, γ, and 1 − φ. To decrease bias in all parameters to near zero, our results suggest that the number of required visits will depend on p, such that the probability of detection at an occupied site is near 0.9, and the required number of sites will be near 60 for ψt estimation and 120 or greater for γ and 1 − φ estimation. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

12.
13.
DNA microarray technology permits the study of biological systems and processes on a genome-wide scale. Arrays based on cDNA clones, oligonucleotides and genomic clones have been developed for investigations of gene expression, genetic analysis and genomic changes associated with disease. Over the past 3-4 years, microarrays have become more widely available to the research community. This has occurred through increased commercial availability of custom and generic arrays and the development of robotic equipment that has enabled array printing and analysis facilities to be established in academic research institutions. This brief review examines the public and commercial resources, the microarray fabrication and data capture and analysis equipment currently available to the user.  相似文献   

14.
MOTIVATION: Advances in DNA microarray technology and computational methods have unlocked new opportunities to identify 'DNA fingerprints', i.e. oligonucleotide sequences that uniquely identify a specific genome. We present an integrated approach for the computational identification of DNA fingerprints for design of microarray-based pathogen diagnostic assays. We provide a quantifiable definition of a DNA fingerprint stated both from a computational as well as an experimental point of view, and the analytical proof that all in silico fingerprints satisfying the stated definition are found using our approach. RESULTS: The presented computational approach is implemented in an integrated high-performance computing (HPC) software tool for oligonucleotide fingerprint identification termed TOFI. We employed TOFI to identify in silico DNA fingerprints for several bacteria and plasmid sequences, which were then experimentally evaluated as potential probes for microarray-based diagnostic assays. Results and analysis of approximately 150 in silico DNA fingerprints for Yersinia pestis and 250 fingerprints for Francisella tularensis are presented. AVAILABILITY: The implemented algorithm is available upon request.  相似文献   

15.

Background  

Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data.  相似文献   

16.
17.
Merging microfluidics with microarray-based bioassays   总被引:1,自引:0,他引:1  
Microarray technologies provide powerful tools for biomedical researchers and medicine, since arrays can be configured to monitor the presence of molecular signatures in a highly parallel fashion and can be configured to search either for nucleic acids (DNA microarrays) or proteins (antibody-based microarrays) as well as different types of cells. Microfluidics on the other hand, provides the ability to analyze small volumes (micro-, nano- or even pico-liters) of sample and minimize costly reagent consumption as well as automate sample preparation and reduce sample processing time. The marriage of microarray technologies with the emerging field of microfluidics provides a number of advantages such as, reduction in reagent cost, reductions in hybridization assay times, high-throughput sample processing, and integration and automation capabilities of the front-end sample processing steps. However, this potential marriage is also fraught with some challenges as well, such as developing low-cost manufacturing methods of the fluidic chips, providing good interfaces to the macro-world, minimizing non-specific analyte/wall interactions due to the high surface-to-volume ratio associated with microfluidics, the development of materials that accommodate the optical readout phases of the assay and complete integration of peripheral components (optical and electrical) to the microfluidic to produce autonomous systems appropriate for point-of-care testing. In this review, we provide an overview and recent advances on the coupling of DNA, protein and cell microarrays to microfluidics and discuss potential improvements required for the implementation of these technologies into biomedical and clinical applications.  相似文献   

18.
Yunsong Qi  Xibei Yang 《Genomics》2013,101(1):38-48
An important application of gene expression data is to classify samples in a variety of diagnostic fields. However, high dimensionality and a small number of noisy samples pose significant challenges to existing classification methods. Focused on the problems of overfitting and sensitivity to noise of the dataset in the classification of microarray data, we propose an interval-valued analysis method based on a rough set technique to select discriminative genes and to use these genes to classify tissue samples of microarray data. We first select a small subset of genes based on interval-valued rough set by considering the preference-ordered domains of the gene expression data, and then classify test samples into certain classes with a term of similar degree. Experiments show that the proposed method is able to reach high prediction accuracies with a small number of selected genes and its performance is robust to noise.  相似文献   

19.
20.
DNA microarray-based screening and diagnostic technologies have long promised comprehensive testing capabilities. However, the potential of these powerful tools has been limited by front-end target-specific nucleic acid amplification. Despite the sensitivity and specificity associated with PCR amplification, the inherent bias and limited throughput of this approach constrain the principal benefits of downstream microarray-based applications, especially for pathogen detection. To begin addressing alternative approaches, we investigated four front-end amplification strategies: random primed, isothermal Klenow fragment-based, phi29 DNA polymerase-based, and multiplex PCR. The utility of each amplification strategy was assessed by hybridizing amplicons to microarrays consisting of 70-mer oligonucleotide probes specific for enterohemorrhagic Escherichia coli O157:H7 and by quantitating their sensitivities for the detection of O157:H7 in laboratory and environmental samples. Although nearly identical levels of hybridization specificity were achieved for each method, multiplex PCR was at least 3 orders of magnitude more sensitive than any individual random amplification approach. However, the use of Klenow-plus-Klenow and phi29 polymerase-plus-Klenow tandem random amplification strategies provided better sensitivities than multiplex PCR. In addition, amplification biases among the five genetic loci tested were 2- to 20-fold for the random approaches, in contrast to >4 orders of magnitude for multiplex PCR. The same random amplification strategies were also able to detect all five diagnostic targets in a spiked environmental water sample that contained a 63-fold excess of contaminating DNA. The results presented here underscore the feasibility of using random amplification approaches and begin to systematically address the versatility of these approaches for unbiased pathogen detection from environmental sources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号