首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: SeqExpress is a stand-alone desktop application for the identification of relevant genes within collections of microarray or SAGE experiments. A number of analysis, filtering and visualization tools are provided to aid in the selection of groups of genes. If R is installed then the application can use this to provide further analysis. AVAILABILITY: SeqExpress is available at: http://www.seqexpress.com  相似文献   

2.
3.
Biological network analysis can be enhanced by examining the connections between nodes and the rest of the network. For this purpose we have developed GraphletCounter, an open-source software tool for computing graphlet degree signatures that can operate on its own or as a plug-in to the network analysis environment Cytoscape. A unique characteristic of GraphletCounter is its ability to compute the graphlet signatures of network motifs, which can be specified by files generated by the motif-finding tool mfinder. GraphletCounter displays graphlet signatures for visual inspection within Cytoscape, and can output graphlet data for integration with larger workflows. AVAILABILITY AND IMPLEMENTATION: GraphletCounter is implemented in Java. It can be downloaded from the Cytoscape plugin repository, and is also available at http://sonmezsysbio.org/software/ graphletcounter.  相似文献   

4.
The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY: The database is available for free at http://115.111.90.196/ridsdb/index.php.  相似文献   

5.
The SYSTERS (short for SYSTEmatic Re-Searching) protein sequence cluster set consists of the classification of all sequences from SWISS-PROT and PIR into disjoint protein family clusters and hierarchically into superfamily and subfamily clusters. The cluster set can be searched with a sequence using the SSMAL search tool or a traditional database search tool like BLAST or FASTA. Additionally a multiple alignment is generated for each cluster and annotated with domain information from the Pfam database of protein domain families. A taxonomic overview of the organisms covered by a cluster is given based on the NCBI taxonomy. The cluster set is available for querying and browsing at http://www.dkfz-heidelberg. de/tbi/services/cluster/systersform  相似文献   

6.
Roundup: a multi-genome repository of orthologs and evolutionary distances   总被引:1,自引:0,他引:1  
SUMMARY: We have created a tool for ortholog and phylogenetic profile retrieval called Roundup. Roundup is backed by a massive repository of orthologs and associated evolutionary distances that was built using the reciprocal smallest distance algorithm, an approach that has been shown to improve upon alternative approaches of ortholog detection, such as reciprocal blast. Presently, the Roundup repository contains all possible pair-wise comparisons for over 250 genomes, including 32 Eukaryotes, more than doubling the coverage of any similar resource. The orthologs are accessible through an intuitive web interface that allows searches by genome or gene identifier, presenting results as phylogenetic profiles together with gene and molecular function annotations. Results may be downloaded as phylogenetic matrices for subsequent analysis, including the construction of whole-genome phylogenies based on gene-content data. AVAILABILITY: http://rodeo.med.harvard.edu/tools/roundup.  相似文献   

7.
ArrayExpress: a public database of gene expression data at EBI   总被引:3,自引:0,他引:3  
ArrayExpress is a public repository for microarray-based gene expression data, resulting from the implementation of the MAGE object model to ensure accurate data structuring and the MIAME standard, which defines the annotation requirements. ArrayExpress accepts data as MAGE-ML files for direct submissions or data from MIAMExpress, the MIAME compliant web-based annotation and submission tool of EBI. A team of curators supports the submission process, providing assistance in data annotation. Data retrieval is performed through a dedicated web interface. Relevant results may be exported to ExpressionProfiler, the EBI based expression analysis tool available online (http://www.ebi.ac.uk/arrayexpress).  相似文献   

8.
TPX is a web-based PubMed search enhancement tool that enables faster article searching using analysis and exploration features. These features include identification of relevant biomedical concepts from search results with linkouts to source databases, concept based article categorization, concept assisted search and filtering, query refinement. A distinguishing feature here is the ability to add user-defined concept names and/or concept types for named entity recognition. The tool allows contextual exploration of knowledge sources by providing concept association maps derived from the MEDLINE repository. It also has a full-text search mode that can be configured on request to access local text repositories, incorporating entity co-occurrence search at sentence/paragraph levels. Local text files can also be analyzed on-the-fly. Availability: http://tpx.atc.tcs.com  相似文献   

9.
Microarray technology has become a standard molecular biology tool. Experimental data have been generated on a huge number of organisms, tissue types, treatment conditions and disease states. The Gene Expression Omnibus (Barrett et al., 2005), developed by the National Center for Bioinformatics (NCBI) at the National Institutes of Health is a repository of nearly 140,000 gene expression experiments. The BioConductor project (Gentleman et al., 2004) is an open-source and open-development software project built in the R statistical programming environment (R Development core Team, 2005) for the analysis and comprehension of genomic data. The tools contained in the BioConductor project represent many state-of-the-art methods for the analysis of microarray and genomics data. We have developed a software tool that allows access to the wealth of information within GEO directly from BioConductor, eliminating many the formatting and parsing problems that have made such analyses labor-intensive in the past. The software, called GEOquery, effectively establishes a bridge between GEO and BioConductor. Easy access to GEO data from BioConductor will likely lead to new analyses of GEO data using novel and rigorous statistical and bioinformatic tools. Facilitating analyses and meta-analyses of microarray data will increase the efficiency with which biologically important conclusions can be drawn from published genomic data. Availability: GEOquery is available as part of the BioConductor project.  相似文献   

10.
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.  相似文献   

11.
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper.  相似文献   

12.
The identification and the screening of Charged Clusters (CCs) residues in proteins is a key analysis to assess any quantitative structure-function correlation in proteins. Here, we present a proteome wide scan for the occurrence of (CCs) in 99292 proteins using a new tool. Finding Clusters Charge in Protein Sequences Program (FCCP). The FCCP has been employed to search CCs in 35 prokaryotic proteomes (7 Psychrophiles, 10 Mesophiles, 9 thermophiles and for 9 hyperthermophiles). A new repository of 855 CC has been created. Results showed that the mapped proteins containing positive and negative charge clusters are mostly transmembrane proteins while the conserved CCs within the same proteome are transposases or involved in DNA binding and integration. Interestingly, the negative charged cluster was associated to bacteria growth's temperature (p=0.002) acting as proteins' core signature. Taken together the various results provide a consistent picture of these screened CCs in terms of its potentials functional roles.  相似文献   

13.
CESE     
Cell electrophysiology simulation environment (CESE) is an integrated environment for performing simulations with a variety of electrophysiological models that have Hodgkin-Huxley and Markovian formulations of ionic currents. CESE is written in Java 2 and is readily portable to a number of operating systems. CESE allows execution of single-cell models and modification and clamping of model parameters, as well as data visualisation and analysis using a consistent interface. Model creation for CESE is facilitated by an object-oriented approach and use of an extensive modelling framework. The Web-based model repository is available. AVAILABILITY: CESE and the Web-based model repository are available at http://cese.sourceforge.net/.  相似文献   

14.
The advent of whole genome sequencing leads to increasing number of proteins with known amino acid sequences. Despite many efforts, the number of proteins with resolved three dimensional structures is still low. One of the challenging tasks the structural biologists face is the prediction of the interaction of metal ion with any protein for which the structure is unknown. Based on the information available in Protein Data Bank, a site (METALACTIVE INTERACTION) has been generated which displays information for significant high preferential and low‐preferential combination of endogenous ligands for 49 metal ions. User can also gain information about the residues present in the first and second coordination sphere as it plays a major role in maintaining the structure and function of metalloproteins in biological system. In this paper, a novel computational tool (ZINCCLUSTER) is developed, which can predict the zinc metal binding sites of proteins even if only the primary sequence is known. The purpose of this tool is to predict the active site cluster of an uncharacterized protein based on its primary sequence or a 3D structure. The tool can predict amino acids interacting with a metal or vice versa. This tool is based on the occurrence of significant triplets and it is tested to have higher prediction accuracy when compared to that of other available techniques.  相似文献   

15.
The human papillomavirus (HPV), a common virus that infects the reproductive tract, may lead to malignant changes within the infection area in certain cases and is directly associated with such cancers as cervical cancer, anal cancer, and vaginal cancer. Identification of novel HPV infection related genes can lead to a better understanding of the specific signal pathways and cellular processes related to HPV infection, providing information for the development of more efficient therapies. In this study, several novel HPV infection related genes were predicted by a computation method based on the known genes involved in HPV infection from HPVbase. This method applied the algorithm of random walk with restart (RWR) to a protein-protein interaction (PPI) network. The candidate genes were further filtered by the permutation and association tests. These steps eliminated genes occupying special positions in the PPI network and selected key genes with strong associations to known HPV infection related genes based on the interaction confidence and functional similarity obtained from published databases, such as STRING, gene ontology (GO) terms and KEGG pathways. Our study identified 104 novel HPV infection related genes, a number of which were confirmed to relate to the infection processes and complications of HPV infection, as reported in the literature. These results demonstrate the reliability of our method in identifying HPV infection related genes.This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.  相似文献   

16.
MOTIVATION: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. RESULTS: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions. AVAILABILITY: R code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

17.
18.
小麦抗旱生态分类中适合性聚类方法的研究   总被引:5,自引:2,他引:3  
探索了适合于小麦品种抗旱生态分类的聚类方法。选用21个农艺性状和15个冬小麦品种(系),在聚类分析的各环节上,通过采用不同的策略,大规模进行了各种分类结果的比较。结果表明,在与专家经验分类接近程度上,数据转换方法中,原始数据法依次大于普通相关阵基础上的方差极大正交旋转法、Promax斜交旋转法、主成份法;相似性度量上,欧氏距离大于马氏距离;聚类方式上,对应分析法和模糊聚类法大于最短距离法、最长距离  相似文献   

19.
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.  相似文献   

20.
The analysis of gene expression temporal profiles is a topic of increasing interest in functional genomics. Model-based clustering methods are particularly interesting because they are able to capture the dynamic nature of these data and to identify the optimal number of clusters. We have defined a new Bayesian method that allows us to cope with some important issues that remain unsolved in the currently available approaches: the presence of time dislocations in gene expression, the non-stationarity of the processes generating the data, and the presence of data collected on an irregular temporal grid. Our method, which is based on random walk models, requires only mild a priori assumptions about the nature of the processes generating the data and explicitly models inter-gene variability within each cluster. It has first been validated on simulated datasets and then employed for the analysis of a dataset relative to serum-stimulated fibroblasts. In all cases, the results have been promising, showing that the method can be helpful in functional genomics research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号