首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Non-biological experimental variation or "batch effects" are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes ( > 25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.  相似文献   

2.
Publicly available genomic data are a great source of biological knowledge that can be extracted when appropriate data analysis is used. Predicting the biological function of genes is of interest to understand molecular mechanisms of virulence and resistance in pathogens and hosts and is important for drug discovery and disease control. This is commonly done by searching for similar gene expression behavior. Here, we used publicly available Streptococcus pyogenes microarray data obtained during primate infection to identify genes that have a potential influence on virulence and Phytophtora infestance inoculated tomato microarray data to identify genes potentially implicated in resistance processes. This approach goes beyond co-expression analysis. We employed a quasi-likelihood model separated by primate gender/inoculation condition to model median gene expression of known virulence/resistance factors. Based on this model, an influence analysis considering time course measurement was performed to detect genes with atypical expression. This procedure allowed for the detection of genes potentially implicated in the infection process. Finally, we discuss the biological meaning of these results, showing that influence analysis is an efficient and useful alternative for functional gene prediction.  相似文献   

3.

Background

Gene expression microarrays measure the levels of messenger ribonucleic acid (mRNA) in a sample using probe sequences that hybridize with transcribed regions. These probe sequences are designed using a reference genome for the relevant species. However, most model organisms and all humans have genomes that deviate from their reference. These variations, which include single nucleotide polymorphisms, insertions of additional nucleotides, and nucleotide deletions, can affect the microarray’s performance. Genetic experiments comparing individuals bearing different population-associated single nucleotide polymorphisms that intersect microarray probes are therefore subject to systemic bias, as the reduction in binding efficiency due to a technical artifact is confounded with genetic differences between parental strains. This problem has been recognized for some time, and earlier methods of compensation have attempted to identify probes affected by genome variants using statistical models. These methods may require replicate microarray measurement of gene expression in the relevant tissue in inbred parental samples, which are not always available in model organisms and are never available in humans.

Results

By using sequence information for the genomes of organisms under investigation, potentially problematic probes can now be identified a priori. However, there is no published software tool that makes it easy to eliminate these probes from an annotation. I present equalizer, a software package that uses genome variant data to modify annotation files for the commonly used Affymetrix IVT and Gene/Exon platforms. These files can be used by any microarray normalization method for subsequent analysis. I demonstrate how use of equalizer on experiments mapping germline influence on gene expression in a genetic cross between two divergent mouse species and in human samples significantly reduces probe hybridization-induced bias, reducing false positive and false negative findings.

Conclusions

The equalizer package reduces probe hybridization bias from experiments performed on the Affymetrix microarray platform, allowing accurate assessment of germline influence on gene expression.  相似文献   

4.
5.
6.
7.

Background

Modern approaches to treating genetic disorders, cancers and even epidemics rely on a detailed understanding of the underlying gene signaling network. Previous work has used time series microarray data to infer gene signaling networks given a large number of accurate time series samples. Microarray data available for many biological experiments is limited to a small number of arrays with little or no time series guarantees. When several samples are averaged to examine differences in mean value between a diseased and normal state, information from individual samples that could indicate a gene relationship can be lost.

Results

Asynchronous Inference of Regulatory Networks (AIRnet) provides gene signaling network inference using more practical assumptions about the microarray data. By learning correlation patterns for the changes in microarray values from all pairs of samples, accurate network reconstructions can be performed with data that is normally available in microarray experiments.

Conclusions

By focussing on the changes between microarray samples, instead of absolute values, increased information can be gleaned from expression data.
  相似文献   

8.
limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.  相似文献   

9.
Cordyceps sinensis (CS) has been commonly used as herbal medicine and a health supplement in China for over two thousand years. Although previous studies have demonstrated that CS has benefits in immunoregulation and anti-inflammation, the precise mechanism by which CS affects immunomodulation is still unclear. In this study, we exploited duplicate sets of loop-design microarray experiments to examine two different batches of CS and analyze the effects of CS on dendritic cells (DCs), in different physiology stages: naïve stage and inflammatory stage. Immature DCs were treated with CS, lipopolysaccharide (LPS), or LPS plus CS (LPS/CS) for two days, and the gene expression profiles were examined using cDNA microarrays. The results of two loop-design microarray experiments showed good intersection rates. The expression level of common genes found in both loop-design microarray experiments was consistent, and the correlation coefficients (Rs), were higher than 0.96. Through intersection analysis of microarray results, we identified 295 intersecting significantly differentially expressed (SDE) genes of the three different treatments (CS, LPS, and LPS/CS), which participated mainly in the adjustment of immune response and the regulation of cell proliferation and death. Genes regulated uniquely by CS treatment were significantly involved in the regulation of focal adhesion pathway, ECM-receptor interaction pathway, and hematopoietic cell lineage pathway. Unique LPS regulated genes were significantly involved in the regulation of Toll-like receptor signaling pathway, systemic lupus erythematosus pathway, and complement and coagulation cascades pathway. Unique LPS/CS regulated genes were significantly involved in the regulation of oxidative phosphorylation pathway. These results could provide useful information in further study of the pharmacological mechanisms of CS. This study also demonstrates that with a rigorous experimental design, the biological effects of a complex compound can be reliably studied by a complex system like cDNA microarray.  相似文献   

10.
11.

Background  

It is well known that the normalization step of microarray data makes a difference in the downstream analysis. All normalization methods rely on certain assumptions, so differences in results can be traced to different sensitivities to violation of the assumptions. Illustrating the lack of robustness, in a striking spike-in experiment all existing normalization methods fail because of an imbalance between up- and down-regulated genes. This means it is still important to develop a normalization method that is robust against violation of the standard assumptions  相似文献   

12.
Pathway analysis using random forests classification and regression   总被引:3,自引:0,他引:3  
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm.  相似文献   

13.
Respirometry consists in the measurement of the biological oxygen consumption rate under well-defined conditions and has been used for the characterization of countless biological processes. In the field of biotechnology and applied microbiology, several respirometry methods are commonly used for the determination of process parameters. Dynamic and static respirometry, which are based on oxygen measurements with or without continuous aeration, respectively, are the methods most commonly used. Additionally to several respirometry methods, different methods have also been developed to retrieve process parameters from respirometric data. Among them, methods based on model fitting and methods based on the injection of substrate pulse at increasing concentration are commonly used. An important question is then; what respirometry and data interpretation methods should be preferably used? So far, and despite a growing interest for respirometry, relatively little attention has been paid on the comparison between the different methods available. In this work, both static and dynamic respirometry methods and both interpretation methods; model fitting and pulses of increasing concentration, were compared to characterize an autotrophic nitrification process. A total of 60 respirometry experiments were done and exhaustively analysed, including sensitivity and error analyses. According to the results obtained, the substrate affinity constant (K S ) was better determined by static respirometry with pulses of increasing concentration and the maximum oxygen uptake rate (OUR ex.max ) was better determined by dynamic respirometry coupled to fitting procedure. The best method for combined K S and OUR ex.max determination was static respirometry with pulses of increasing concentration.  相似文献   

14.
15.
The use of parasites as biological tags for discrimination of fish stocks has become a commonly used approach in fisheries management. Metazoan parasite community analysis and anisakid nematode population genetics based on a mitochondrial cytochrome marker were applied in order to assess the usefulness of the two parasitological methods for stock discrimination of beaked redfish Sebastes mentella of three fishing grounds in the North East Atlantic. Multivariate, model-based approaches demonstrated that the metazoan parasite fauna of beaked redfish from East Greenland differed from Tampen, northern North Sea, and Bear Island, Barents Sea. A joint model (latent variable model) was used to estimate the effects of covariates on parasite species and identified four parasite species as main source of differences among fishing grounds; namely Chondracanthus nodosus, Anisakis simplex s.s., Hysterothylacium aduncum, and Bothriocephalus scorpii. Due to its high abundance and differences between fishing grounds, Anisakis simplex s.s. was considered as a major biological tag for host stock differentiation. Whilst the sole examination of Anisakis simplex s.s. on a population genetic level is only of limited use, anisakid nematodes (in particular, A. simplex s.s.) can serve as biological tags on a parasite community level. This study confirmed the use of multivariate analyses as a tool to evaluate parasite infra-communities and to identify parasite species that might serve as biological tags. The present study suggests that S. mentella in the northern North Sea and Barents Sea is not sub-structured.  相似文献   

16.
17.
Liang Y  Zhang F  Wang J  Joshi T  Wang Y  Xu D 《PloS one》2011,6(7):e21750

Background

Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits.

Methodology

In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination) feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility.

Conclusions

We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.  相似文献   

18.
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.  相似文献   

19.
20.
Identifying perturbed or dysregulated pathways is critical to understanding the biological processes that change within an experiment. Previous methods identified important pathways that are significantly enriched among differentially expressed genes; however, these methods cannot account for small, coordinated changes in gene expression that amass across a whole pathway. In order to overcome this limitation, we use microarray gene expression data to identify pathway perturbation based on pathway correlation profiles. By identifying the distribution of gene-gene pair correlations within a pathway, we can rank the pathways based on the level of perturbation and dysregulation. We have shown this successfully for differences between two experimental conditions in Escherichia coli and changes within time series data in Saccharomyces cerevisiae, as well as two estrogen receptor response classes of breast cancer. Overall, our method made significant predictions as to the pathway perturbations that are involved in the experimental conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号