期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

RankGene: identification of diagnostic genes based on expression data 总被引：9，自引：0，他引：9

Su Y Murali TM Pavlovic V Schaffer M Kasif S 《Bioinformatics (Oxford, England)》2003,19(12):1578-1579

RankGene is a program for analyzing gene expression data and computing diagnostic genes based on their predictive power in distinguishing between different types of samples. The program integrates into one system a variety of popular ranking criteria, ranging from the traditional t-statistic to one-dimensional support vector machines. This flexibility makes RankGene a useful tool in gene expression analysis and feature selection. 相似文献

2.

Automated identification of reference genes based on RNA-seq data

Rosario?Carmona Macarena?Arroyo María?José?Jiménez-Quesada Pedro?Seoane Adoración?Zafra Rafael?Larrosa Juan?de Dios?Alché M.?Gonzalo?Claros Email author 《Biomedical engineering online》2017,16(1):65

Background

Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs.

Results

An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm.

Conclusion

Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs.

相似文献

3.

Analytical approaches to RNA profiling data for the identification of genes enriched in specific cells

Joseph D. Dougherty Eric F. Schmidt Miho Nakajima Nathaniel Heintz 《Nucleic acids research》2010,38(13):4218-4230

相似文献

4.

Density based pruning for identification of differentially expressed genes from microarray data

Hu J Xu J 《BMC genomics》2010,11(Z2):S3

Motivation

Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes.

Results

We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change.

Conclusions

Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune

相似文献

5.

Machine learning of functional class from phenotype data 总被引：5，自引：0，他引：5

Clare A King RD 《Bioinformatics (Oxford, England)》2002,18(1):160-166

MOTIVATION: Mutant phenotype growth experiments are an important novel source of functional genomics data which have received little attention in bioinformatics. We applied supervised machine learning to the problem of using phenotype data to predict the functional class of Open Reading Frames (ORFs) in Saccaromyces cerevisiae. Three sources of data were used: TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces (TRIPLES), European Functional Analysis Network (EUROFAN) and Munich Information Center for Protein Sequences (MIPS). The analysis of the data presented a number of challenges to machine learning: multi-class labels, a large number of sparsely populated classes, the need to learn a set of accurate rules (not a complete classification), and a very large amount of missing values. We modified the algorithm C4.5 to deal with these problems. RESULTS: Rules were learnt which are accurate and biologically meaningful. The rules predict function of 83 ORFs of unknown function at an estimated accuracy of > or = 80%. 相似文献

6.

Genomic approaches that aid in the identification of transcription factor target genes

Kirmizis A Farnham PJ 《Experimental biology and medicine (Maywood, N.J.)》2004,229(8):705-721

相似文献

7.

New approaches to schistosome identification 总被引：1，自引：0，他引：1

Rollinson D Walker TK Simpson AJ 《Parasitology today (Personal ed.)》1986,2(1):24-25

相似文献

8.

Clustering approaches to identifying gene expression patterns from DNA microarray data

Do JH Choi DK 《Molecules and cells》2008,25(2):279-288

相似文献

9.

Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data

Yu J Chen XW 《Bioinformatics (Oxford, England)》2005,21(Z1):i487-i494

相似文献

10.

Machine learning approaches for the prediction of signal peptides and other protein sorting signals 总被引：10，自引：0，他引：10

Nielsen H Brunak S von Heijne G 《Protein engineering》1999,12(1):3-9

Prediction of protein sorting signals from the sequence of amino acids has great importance in the field of proteomics today. Recently, the growth of protein databases, combined with machine learning approaches, such as neural networks and hidden Markov models, have made it possible to achieve a level of reliability where practical use in, for example automatic database annotation is feasible. In this review, we concentrate on the present status and future perspectives of SignalP, our neural network-based method for prediction of the most well-known sorting signal: the secretory signal peptide. We discuss the problems associated with the use of SignalP on genomic sequences, showing that signal peptide prediction will improve further if integrated with predictions of start codons and transmembrane helices. As a step towards this goal, a hidden Markov model version of SignalP has been developed, making it possible to discriminate between cleaved signal peptides and uncleaved signal anchors. Furthermore, we show how SignalP can be used to characterize putative signal peptides from an archaeon, Methanococcus jannaschii. Finally, we briefly review a few methods for predicting other protein sorting signals and discuss the future of protein sorting prediction in general. 相似文献

11.

Bayesian biomarker identification based on marker-expression proteomics data

Bhattacharjee M Botting CH Sillanpää MJ 《Genomics》2008,92(6):384-392

We are studying variable selection in multiple regression models in which molecular markers and/or gene-expression measurements as well as intensity measurements from protein spectra serve as predictors for the outcome variable (i.e., trait or disease state). Finding genetic biomarkers and searching genetic–epidemiological factors can be formulated as a statistical problem of variable selection, in which, from a large set of candidates, a small number of trait-associated predictors are identified. We illustrate our approach by analyzing the data available for chronic fatigue syndrome (CFS). CFS is a complex disease from several aspects, e.g., it is difficult to diagnose and difficult to quantify. To identify biomarkers we used microarray data and SELDI-TOF-based proteomics data. We also analyzed genetic marker information for a large number of SNPs for an overlapping set of individuals. The objectives of the analyses were to identify markers specific to fatigue that are also possibly exclusive to CFS. The use of such models can be motivated, for example, by the search for new biomarkers for the diagnosis and prognosis of cancer and measures of response to therapy. Generally, for this we use Bayesian hierarchical modeling and Markov Chain Monte Carlo computation. 相似文献

12.

Multiple approaches to data-mining of proteomic data based on statistical and pattern classification methods

Tatay JW Feng X Sobczak N Jiang H Chen CF Kirova R Struble C Wang NJ Tonellato PJ 《Proteomics》2003,3(9):1704-1709

The data-mining challenge presented is composed of two fundamental problems. Problem one is the separation of forty-one subjects into two classifications based on the data produced by the mass spectrometry of protein samples from each subject. Problem two is to find the specific differences between protein expression data of two sets of subjects. In each problem, one group of subjects has a disease, while the other group is nondiseased. Each problem was approached with the intent to introduce a new and potentially useful tool to analyze protein expression from mass spectrometry data. A variety of methodologies, both conventional and nonconventional were used in the analysis of these problems. The results presented show both overlap and discrepancies. What is important is the breadth of the techniques and the future direction this analysis will create. 相似文献

13.

Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data 总被引：4，自引：0，他引：4

Luan Y Li H 《Bioinformatics (Oxford, England)》2004,20(3):332-339

相似文献

14.

Inductive learning approaches to rainfall-runoff modelling

Dawson CW Brown MR Wilby RL 《International journal of neural systems》2000,10(1):43-57

Trying to model the rainfall-runoff process is a complex activity as it is influenced by a number of implicit and explicit factors--for example, precipitation distribution, evaporation, transpiration, abstraction, watershed topography, and soil types. However, this kind of forecasting is particularly important as it is used to predict serious flooding, estimate erosion and identify problems associated with low flow. Inductive learning approaches (e.g. decision trees and artificial neural networks) are particularly well suited to problems of this nature as they can often interpret underlying factors (such as seasonal variations) which cannot be modelled by other techniques. In addition, these approaches can easily be trained on the explicit factors (e.g. rainfall) and the inexplicit factors (e.g. abstraction) that affect river flow. Inductive learning approaches can also be extended to account for new factors that emerge over a period of time. This paper evaluates the application of decision trees and two artificial neural network models (the multilayer perceptron and the radial basis function network) to river flow forecasting in two flood prone UK catchments using real hydrometric data. Comparisons are made between the performance of these approaches and conventional flood forecasting systems. 相似文献

15.

Tandem machine learning for the identification of genes regulated by transcription factors

Deendayal?Dinakarpandian Email author Venetia?Raheja Saumil?Mehta Erin?G?Schuetz Peter?K?Rogan 《BMC bioinformatics》2005,6(1):204

相似文献

16.

An examination of on-line machine learning approaches for pseudo-random generated data

Jia Zhu Chuanhua Xu Zhixu Li Gabriel Fung Xueqin Lin Jin Huang Changqin Huang 《Cluster computing》2016,19(3):1309-1321

A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set. 相似文献

17.

Functional genomics: learning to think about gene expression data. 总被引：2，自引：0，他引：2

R Brent 《Current biology : CB》1999,9(9):R338-R341

Three recent studies of gene expression patterns in whole cells provide examples of the inferences one can make from this type of information. They also provide examples of the non-traditional types of reasoning we will need to use to make such inferences. 相似文献

18.

A search engine to identify pathway genes from expression data on multiple organisms

Chunnuan Chen Matthew T Weirauch Corey C Powell Alexander C Zambon Joshua M Stuart 《BMC systems biology》2007,1(1):20-19

相似文献

19.

Assigning functions to genes: identification of S-phase expressed genes in Leishmania major based on post-transcriptional control elements

下载免费PDF全文

Zick A Onn I Bezalel R Margalit H Shlomai J 《Nucleic acids research》2005,33(13):4235-4242

相似文献

20.

Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data

Miozzi L Piro RM Rosa F Ala U Silengo L Di Cunto F Provero P 《PloS one》2008,3(6):e2439

相似文献