共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Finding edging genes from microarray data 总被引:1,自引:0,他引:1
MOTIVATION: A set of genes and their gene expression levels are used to classify disease and normal tissues. Due to the massive number of genes in microarray, there are a large number of edges to divide different classes of genes in microarray space. The edging genes (EGs) can be co-regulated genes, they can also be on the same pathway or deregulated by the same non-coding genes, such as siRNA or miRNA. Every gene in EGs is vital for identifying a tissue's class. The changing in one EG's gene expression may cause a tissue alteration from normal to disease and vice versa. Finding EGs is of biological importance. In this work, we propose an algorithm to effectively find these EGs. RESULT: We tested our algorithm with five microarray datasets. The results are compared with the border-based algorithm which was used to find gene groups and subsequently divide different classes of tissues. Our algorithm finds a significantly larger amount of EGs than does the border-based algorithm. As our algorithm prunes irrelevant patterns at earlier stages, time and space complexities are much less prevalent than in the border-based algorithm. AVAILABILITY: The algorithm proposed is implemented in C++ on Linux platform. The EGs in five microarray datasets are calculated. The preprocessed datasets and the discovered EGs are available at http://www3.it.deakin.edu.au/~phoebe/microarray.html. 相似文献
3.
Rapid divergence in expression between duplicate genes inferred from microarray data 总被引:15,自引:0,他引:15
For more than 30 years, expression divergence has been considered as a major reason for retaining duplicated genes in a genome, but how often and how fast duplicate genes diverge in expression has not been studied at the genomic level. Using yeast microarray data, we show that expression divergence between duplicate genes is significantly correlated with their synonymous divergence (KS) and also with their nonsynonymous divergence (KA) if KA ≤ 0.3. Thus, expression divergence increases with evolutionary time, and KA is initially coupled with expression divergence. More interestingly, a large proportion of duplicate genes have diverged quickly in expression and the vast majority of gene pairs eventually become divergent in expression. Indeed, more than 40% of gene pairs show expression divergence even when KS is ≤ 0.10, and this proportion becomes >80% for KS > 1.5. Only a small fraction of ancient gene pairs do not show expression divergence. 相似文献
4.
MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html 相似文献
5.
6.
7.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data 总被引:3,自引:0,他引:3
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/. 相似文献
8.
Mining co-regulated gene profiles for the detection of functional associations in gene expression data 总被引:1,自引:0,他引:1
Gyenesei A Wagner U Barkow-Oesterreicher S Stolte E Schlapbach R 《Bioinformatics (Oxford, England)》2007,23(15):1927-1935
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques. 相似文献
9.
Liefeld T Reich M Gould J Zhang P Tamayo P Mesirov JP 《Bioinformatics (Oxford, England)》2005,21(18):3681-3682
SUMMARY: GeneCruiser is a web service allowing users to annotate their genomic data by mapping microarray feature identifiers to gene identifiers from databases, such as UniGene, while providing links to web resources, such as the UCSC Genome Browser. It relies on a regularly updated database that retrieves and indexes the mappings between microarray probes and genomic databases. Genes are identified using the Life Sciences Identifier standard. AVAILABILITY: GeneCruiser is freely available in the following forms: Web service and Web application, http://www.genecruiser.org; GenePattern, GeneCruiser access has been integrated into our microarray analysis platform, GenePattern. http://www.genepattern.org. 相似文献
10.
Background
One frequent application of microarray experiments is in the study of monitoring gene activities in a cell during cell cycle or cell division. A new challenge for analyzing the microarray experiments is to identify genes that are statistically significantly periodically expressed during the cell cycle. Such a challenge occurs due to the large number of genes that are simultaneously measured, a moderate to small number of measurements per gene taken at different time points, and high levels of non-normal random noises inherited in the data. 相似文献11.
The vast amount of unstructured data emerging from the various genome projects has led to the development of a number of web-based tools designed to annotate genes with biological information. Here we discuss a selection of these tools with regards to their scope, limitations and ease of use. 相似文献
12.
MOTIVATION: The last few years have seen the development of DNA microarray technology that allows simultaneous measurement of the expression levels of thousands of genes. While many methods have been developed to analyze such data, most have been visualization-based. Methods that yield quantitative conclusions have been diverse and complex. RESULTS: We present two straightforward methods for identifying specific genes whose expression is linked with a phenotype or outcome variable as well as for systematically predicting sample class membership: (1) a conservative, permutation-based approach to identifying differentially expressed genes; (2) an augmentation of K-nearest-neighbor pattern classification. Our analyses replicate the quantitative conclusions of Golub et al. (1999; Science, 286, 531-537) on leukemia data, with better classification results, using far simpler methods. With the breast tumor data of Perou et al. (2000; Nature, 406, 747-752), the methods lend rigorous quantitative support to the conclusions of the original paper. In the case of the lymphoma data in Alizadeh et al. (2000; Nature, 403, 503-511), our analyses only partially support the conclusions of the original authors. AVAILABILITY: The software and supplementary information are available freely to researchers at academic and non-profit institutions at http://cc.ucsf.edu/jain/public 相似文献
13.
Modern microarray technology is capable of providing data about the expression of thousands of genes, and even of whole genomes. An important question is how this technology can be used most effectively to unravel the workings of cellular machinery. Here, we propose a method to infer genetic networks on the basis of data from appropriately designed microarray experiments. In addition to identifying the genes that affect a specific other gene directly, this method also estimates the strength of such effects. We will discuss both the experimental setup and the theoretical background. 相似文献
14.
'Melina' assists users to compare the results of four public softwares for DNA motif extraction in order to both confirm the reliability of each finding and avoid missing important motifs. It is also useful to optimize the sensitivity of software with a series of different parameter settings. AVAILABILITY: Melina is available at http://www.hgc.ims.u-tokyo.ac.jp/Melina/. 相似文献
15.
16.
CIT: identification of differentially expressed clusters of genes from microarray data 总被引:3,自引:0,他引:3
Cluster Identification Tool (CIT) is a microarray analysis program that identifies differentially expressed genes. Following division of experimental samples based on a parameter of interest, CIT uses a statistical discrimination metric and permutation analysis to identify clusters of genes or individual genes that best differentiate between the experimental groups. CIT integrates with the freely available CLUSTER and TREEVIEW programs to form a more complete microarray analysis package. 相似文献
17.
18.
19.
Willbrand K Radvanyi F Nadal JP Thiery JP Fink TM 《Bioinformatics (Oxford, England)》2005,21(20):3859-3864
MOTIVATION: We consider any collection of microarrays that can be ordered to form a progression; for example, as a function of time, severity of disease or dose of a stimulant. By plotting the expression level of each gene as a function of time, or severity, or dose, we form an expression series, or curve, for each gene. While most of these curves will exhibit random fluctuations, some will contain a pattern, and these are the genes that are most likely associated with the quantity used to order them. RESULTS: We introduce a method of identifying the pattern and hence genes in microarray expression curves without knowing what kind of pattern to look for. Key to our approach is the sequence of ups and downs formed by pairs of consecutive data points in each curve. As a benchmark, we blindly identified genes from yeast cell cycles without selecting for periodic or any other anticipated behaviour. CONTACT: tmf20@cam.ac.uk SUPPLEMENTARY INFORMATION: The complete versions of Table 2 and Figure 4, as well as other material, can be found at http://www.lps.ens.fr/~willbran/up-down/ or http://www.tcm.phy.cam.ac.uk/~tmf20/up-down/ 相似文献
20.
Jeff W Chou Tong Zhou William K Kaufmann Richard S Paules Pierre R Bushel 《BMC bioinformatics》2007,8(1):427