首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
While meta-analysis provides a powerful tool for analyzing microarray experiments by combining data from multiple studies, it presents unique computational challenges. The Bioconductor package RankProd provides a new and intuitive tool for this purpose in detecting differentially expressed genes under two experimental conditions. The package modifies and extends the rank product method proposed by Breitling et al., [(2004) FEBS Lett., 573, 83-92] to integrate multiple microarray studies from different laboratories and/or platforms. It offers several advantages over t-test based methods and accepts pre-processed expression datasets produced from a wide variety of platforms. The significance of the detection is assessed by a non-parametric permutation test, and the associated P-value and false discovery rate (FDR) are included in the output alongside the genes that are detected by user-defined criteria. A visualization plot is provided to view actual expression levels for each gene with estimated significance measurements. AVAILABILITY: RankProd is available at Bioconductor http://www.bioconductor.org. A web-based interface will soon be available at http://cactus.salk.edu/RankProd  相似文献   

2.
MOTIVATION: Many applications of microarray technology in clinical cancer studies aim at detecting molecular features for refined diagnosis. In this paper, we follow an opposite rationale: we try to identify common molecular features shared by phenotypically distinct types of cancer using a meta-analysis of several microarray studies. We present a novel algorithm to uncover that two lists of differentially expressed genes are similar, even if these similarities are not apparent to the eye. The method is based on the ordering in the lists. RESULTS: In a meta-analysis of five clinical microarray studies we were able to detect significant similarities in five of the ten possible comparisons of ordered gene lists. We included studies, where not a single gene can be significantly associated to outcome. The detection of significant similarities of gene lists from different microarray studies is a novel and promising approach. It has the potential to improve upon specialized cancer studies by exploring the power of several studies in one single analysis. Our method is complementary to previous methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent ones across multiple studies.  相似文献   

3.
pcaMethods is a Bioconductor compliant library for computing principal component analysis (PCA) on incomplete data sets. The results can be analyzed directly or used to estimate missing values to enable the use of missing value sensitive statistical methods. The package was mainly developed with microarray and metabolite data sets in mind, but can be applied to any other incomplete data set as well. AVAILABILITY: http://www.bioconductor.org  相似文献   

4.

Background

One aspect in which RNA sequencing is more valuable than microarray-based methods is the ability to examine the allelic imbalance of the expression of a gene. This process is often a complex task that entails quality control, alignment, and the counting of reads over heterozygous single-nucleotide polymorphisms. Allelic imbalance analysis is subject to technical biases, due to differences in the sequences of the measured alleles. Flexible bioinformatics tools are needed to ease the workflow while retaining as much RNA sequencing information as possible throughout the analysis to detect and address the possible biases.

Results

We present AllelicImblance, a software program that is designed to detect, manage, and visualize allelic imbalances comprehensively. The purpose of this software is to allow users to pose genetic questions in any RNA sequencing experiment quickly, enhancing the general utility of RNA sequencing. The visualization features can reveal notable, non-trivial allelic imbalance behavior over specific regions, such as exons.

Conclusions

The software provides a complete framework to perform allelic imbalance analyses of aligned RNA sequencing data, from detection to visualization, within the robust and versatile management class, ASEset.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0620-2) contains supplementary material, which is available to authorized users.  相似文献   

5.
SUMMARY: SScore is an R package that facilitates the comparison of gene expression between Affymetrix GeneChips using the S-score algorithm. The S-score algorithm uses probe level data directly to assess differences in gene expression, without requiring a preliminary separate step of probe set expression summary estimation. Therefore, the algorithm avoids introduction of error associated with the expression summary estimation process and has been demonstrated to improve the accuracy of identifying differentially expressed genes. The S-score produces accurate results even when few or no replicates are available. AVAILABILITY: The R package SScore is available from Bioconductor at http://www.bioconductor.org  相似文献   

6.

Background

So far many algorithms have been proposed towards the detection of significant genes in microarray analysis problems. Several of those approaches are freely available as R-packages though their engagement in gene expression analysis by non-bioinformaticians is usually a frustrating task. Besides, only some of those packages offer a complete suite of tools starting from initial data import and ending to analysis report. Here we present an R/Bioconductor package that implements a hybrid gene selection method along with a bunch of functions to facilitate a thorough and convenient gene expression profiling analysis.

Results

mAPKL is an open-source R/Bioconductor package that implements the mAP-KL hybrid gene selection method. The advantage of this method is that selects a small number of gene exemplars while achieving comparable classification results to other well established algorithms on a variety of datasets and dataset sizes. The mAPKL package is accompanied with extra functionalities including (i) solid data import; (ii) data sampling following a user-defined proportion; (iii) preprocessing through several normalization and transformation alternatives; (iv) classification with the aid of SVM and performance evaluation; (v) network analysis of the significant genes (exemplars), including degree of centrality, closeness, betweeness, clustering coefficient as well as the construction of an edge list table; (vi) gene annotation analysis, (vii) pathway analysis and (viii) auto-generated analysis reporting.

Conclusions

Users are able to run a thorough gene expression analysis in a timely manner starting from raw data and concluding to network characteristics of the selected gene exemplars. Detailed instructions and example data are provided in the R package, which is freely available at Bioconductor under the GPL-2 or later license http://www.bioconductor.org/packages/3.1/bioc/html/mAPKL.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0719-5) contains supplementary material, which is available to authorized users.  相似文献   

7.
This paper reviews the central concepts and implementation of data structures and methods for studying genetics of gene expression with the GGtools package of Bioconductor. Illustration with a HapMap+expression dataset is provided. Availability: Package GGtools is part of Bioconductor 1.9 (http://bioconductor.org). Open source with Artistic License.  相似文献   

8.
Nested effects models (NEMs) are a class of probabilistic models introduced to analyze the effects of gene perturbation screens visible in high-dimensional phenotypes like microarrays or cell morphology. NEMs reverse engineer upstream/downstream relations of cellular signaling cascades. NEMs take as input a set of candidate pathway genes and phenotypic profiles of perturbing these genes. NEMs return a pathway structure explaining the observed perturbation effects. Here, we describe the package nem, an open-source software to efficiently infer NEMs from data. Our software implements several search algorithms for model fitting and is applicable to a wide range of different data types and representations. The methods we present summarize the current state-of-the-art in NEMs. AVAILABILITY: Our software is written in the R language and freely avail-able via the Bioconductor project at http://www.bioconductor.org.  相似文献   

9.

Background  

In microarray studies researchers are often interested in the comparison of relevant quantities between two or more similar experiments, involving different treatments, tissues, or species. Typically each experiment reports measures of significance (e.g. p-values) or other measures that rank its features (e.g genes). Our objective is to find a list of features that are significant in all experiments, to be further investigated. In this paper we present an R package called sdef, that allows the user to quantify the evidence of communality between the experiments using previously proposed statistical methods based on the ranked lists of p-values. sdef implements two approaches that address this objective: the first is a permutation test of the maximal ratio of observed to expected common features under the hypothesis of independence between the experiments. The second approach, set in a Bayesian framework, is more flexible as it takes into account the uncertainty on the number of genes differentially expressed in each experiment.  相似文献   

10.

Background  

Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary.  相似文献   

11.
MOTIVATION: Functional analyses based on the association of Gene Ontology (GO) terms to genes in a selected gene list are useful bioinformatic tools and the GOstats package has been widely used to perform such computations. In this paper we report significant improvements and extensions such as support for conditional testing. RESULTS: We discuss the capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditional hypergeometric that uses the relationships among GO terms to decorrelate the results. AVAILABILITY: GOstats is available as an R package from the Bioconductor project: http://bioconductor.org  相似文献   

12.
Many genome-scale studies in molecular biology deliver results in the form of a ranked list of gene names, accordingly to some scoring method. There is always the question how many top-ranked genes to consider for further analysis, for example, in order creating a diagnostic or predictive gene signature for a disease. This question is usually approached from a statistical point of view, without considering any biological properties of top-ranked genes or how they are related to each other functionally. Here we suggest a new method for selecting a number of genes in a ranked gene list such that this set forms the Optimally Functionally Enriched Network (OFTEN), formed by known physical interactions between genes or their products. The method allows associating a network with the gene list, providing easier interpretation of the results and classifying the genes or proteins accordingly to their position in the resulting network. We demonstrate the method on four breast cancer datasets and show that 1) the resulting gene signatures are more reproducible from one dataset to another compared to standard statistical procedures and 2) the overlap of these signatures has significant prognostic potential. The method is implemented in BiNoM Cytoscape plugin (http://binom.curie.fr).  相似文献   

13.
A special matrix of amino acid antigenic similarity for computer detection of the potential antigenic proximity of unrelated proteins is proposed. The matrix was built using the data concerning affinities of amino acid residue interactions between subunits in oligomeric proteins. The diagonal elements of the matrix characterize the recognition of amino acid residues and the non-diagonal ones represent the relative similarity measure of antibody--amino acid residue interactions specificity. The application of the new matrix for comparing proteins allows the hydrophilic potentially immunologically active regions of sequences to be picked out as similar fragments. When the influenza virus hemagglutinin was compared with 116 human proteins, eight fragments were picked out, that could not be determined by means of the routinely used MDM78 matrix. The antigenic similarity matrix for defining the forbidden structures is proposed to be used for preparing the peptidic antiviral vaccines.  相似文献   

14.
Statistical tests for detecting gene conversion   总被引:28,自引:18,他引:28  
Statistical tests for detecting gene conversion are described for a sample of homologous DNA sequences. The tests are based on imbalances in the distribution of segments on which some pair of sequences agrees. The methods automatically control for variable mutation rates along the genome and do not depend on a priori choices of potentially monophyletic subsets of the sample. The tests show strong evidence for multiple intragenic conversion events at two loci in Escherichia coli. The gnd locus in E. coli shows a highly significant excess of maximal segments of length 70-200 bp, which suggests conversion events of that size. The data also indicate that the rate of these short conversion events might be of the order of neutral mutation rate. There is also evidence for correlated mutation in adjacent codon positions. The same tests applied to a locus in an RNA virus were negative.   相似文献   

15.
BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%-92%, depending on the organism.  相似文献   

16.

Background

Microbial abundance profiles are applied widely to understand diseases from the aspect of microbial communities. By investigating the abundance associations of species or genes, we can construct molecular ecological networks (MENs). The MENs are often constructed by calculating the Pearson correlation coefficient (PCC) between genes. In this work, we also applied multimodal mutual information (MMI) to construct MENs. The members which drive the concerned MENs are referred to as key drivers.

Results

We proposed a novel method to detect the key drivers. First, we partitioned the MEN into subnetworks. Then we identified the most pertinent subnetworks to the disease by measuring the correlation between the abundance pattern and the delegated phenotype—the variable representing the disease phenotypes. Last, for each identified subnetwork, we detected the key driver by PageRank. We developed a package named KDiamend and applied it to the gut and oral microbial data to detect key drivers for Type 2 diabetes (T2D) and Rheumatoid Arthritis (RA). We detected six T2D-relevant subnetworks and three key drivers of them are related to the carbohydrate metabolic process. In addition, we detected nine subnetworks related to RA, a disease caused by compromised immune systems. The extracted subnetworks include InterPro matches (IPRs) concerned with immunoglobulin, Sporulation, biofilm, Flaviviruses, bacteriophage, etc., while the development of biofilms is regarded as one of the drivers of persistent infections.

Conclusion

KDiamend is feasible to detect key drivers and offers insights to uncover the development of diseases. The package is freely available at http://www.deepomics.org/pipelines/3DCD6955FEF2E64A/.
  相似文献   

17.
Complex diseases are multifactorial in nature and can involve multiple loci with gene x gene and gene x environment interactions. Research on methods to uncover the interactions between those genes that confer susceptibility to disease has been extensive, but many of these methods have only been developed for sibling pairs or sibships. In this report, we assess the performance of two methods for finding gene x gene interactions that are applicable to arbitrarily sized pedigrees, one based on correlation in per-family nonparametric linkage scores and another that incorporates candidate loci genotypes as covariates into an affected relative pair linkage analysis. The power and type I error rate of both of these methods was addressed using the simulated Genetic Analysis Workshop 14 data. In general, we found detection of the interacting loci to be a difficult problem, and though we experienced some modest success there is a clear need to continue developing new methods and approaches to the problem.  相似文献   

18.
We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at .  相似文献   

19.
We propose using a variant of logistic regression (LR) with-regularization to fit gene–gene and gene–environment interaction models. Studies haveshown that many common diseases are influenced by interactionof certain genes. LR models with quadratic penalization notonly correctly characterizes the influential genes along withtheir interaction structures but also yields additional benefitsin handling high-dimensional, discrete factors with a binaryresponse. We illustrate the advantages of using an -regularization scheme and compare its performancewith that of "multifactor dimensionality reduction" and "FlexTree,"2 recent tools for identifying gene–gene interactions.Through simulated and real data sets, we demonstrate that ourmethod outperforms other methods in the identification of theinteraction structures as well as prediction accuracy. In addition,we validate the significance of the factors selected throughbootstrap analyses.  相似文献   

20.
MOTIVATION: Microarrays rapidly generate large quantities of gene expression information, but interpreting such data within a biological context is still relatively complex and laborious. New methods that can identify functionally related genes via shared literature concepts will be useful in addressing these needs. RESULTS: We have developed a novel method that uses implicit literature relationships (concepts related via shared, intermediate concepts) to cluster related genes. Genes are evaluated for implicit connections within a network of biomedical objects (other genes, ontological concepts and diseases) that are connected via their co-occurrences in Medline titles and/or abstracts. On the basis of these implicit relationships, individual gene pairs are scored using a probability-based algorithm. Scores are generated for all pairwise combinations of genes, which are then clustered based on the scores. We applied this method to a test set composed of nine functional groups with known relationships. The method scored highly for all nine groups and significantly better than a benchmark co-occurrence-based method for six groups. We then applied this method to gene sets specific to two previously defined breast tumor subtypes. Analysis of the results recapitulated known biological relationships and identified novel pathway relationships unique to each tumor subtype. We demonstrate that this method provides a valuable new means of identifying and visualizing significantly related genes within gene lists via their implicit relationships in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号