首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although protein–RNA interactions (PRIs) are involved in various important cellular processes, compiled data on PRIs are still limited. This contrasts with protein–protein interactions, which have been intensively recorded in public databases and subjected to network level analysis. Here, we introduce PRD, an online database of PRIs, dispersed across several sources, including scientific literature. Currently, over 10,000 interactions have been stored in PRD using PSI-MI 2.5, which is a standard model for describing detailed molecular interactions, with an emphasis on gene level data. Users can browse all recorded interactions and execute flexible keyword searches against the database via a web interface. Our database is not only a reference of PRIs, but will also be a valuable resource for studying characteristics of PRI networks.

Availability

PRD can be freely accessed at http://pri.hgc.jp/  相似文献   

2.
Tomovic A  Oakeley EJ 《PloS one》2008,3(9):e3243

Background

With increasing numbers of crystal structures of protein∶DNA and protein∶protein∶DNA complexes publically available, it is now possible to extract sufficient structural, physical-chemical and thermodynamic parameters to make general observations and predictions about their interactions. In particular, the properties of macromolecular assemblies of multiple proteins bound to DNA have not previously been investigated in detail.

Methodology/Principal Findings

We have performed computational structural analyses on macromolecular assemblies of multiple proteins bound to DNA using a variety of different computational tools: PISA; PROMOTIF; X3DNA; ReadOut; DDNA and DCOMPLEX. Additionally, we have developed and employed an algorithm for approximate collision detection and overlapping volume estimation of two macromolecules. An implementation of this algorithm is available at http://promoterplot.fmi.ch/Collision1/. The results obtained are compared with structural, physical-chemical and thermodynamic parameters from protein∶protein and single protein∶DNA complexes. Many of interface properties of multiple protein∶DNA complexes were found to be very similar to those observed in binary protein∶DNA and protein∶protein complexes. However, the conformational change of the DNA upon protein binding is significantly higher when multiple proteins bind to it than is observed when single proteins bind. The water mediated contacts are less important (found in less quantity) between the interfaces of components in ternary (protein∶protein∶DNA) complexes than in those of binary complexes (protein∶protein and protein∶DNA).The thermodynamic stability of ternary complexes is also higher than in the binary interactions. Greater specificity and affinity of multiple proteins binding to DNA in comparison with binary protein-DNA interactions were observed. However, protein-protein binding affinities are stronger in complexes without the presence of DNA.

Conclusions/Significance

Our results indicate that the interface properties: interface area; number of interface residues/atoms and hydrogen bonds; and the distribution of interface residues, hydrogen bonds, van der Walls contacts and secondary structure motifs are independent of whether or not a protein is in a binary or ternary complex with DNA. However, changes in the shape of the DNA reduce the off-rate of the proteins which greatly enhances the stability and specificity of ternary complexes compared to binary ones.  相似文献   

3.
MOTIVATION: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. RESULTS: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. CONCLUSIONS: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. AVAILABILITY: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.  相似文献   

4.
Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochemical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is laborintensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and similarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface(GUI) tool named Bio Cluster. This tool was designed for automated clustering and identification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering(HC) and the Improved Hierarchical Clustering(IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clustering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that Bio Cluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.  相似文献   

5.
MOTIVATION: The biologic significance of results obtained through cluster analyses of gene expression data generated in microarray experiments have been demonstrated in many studies. In this article we focus on the development of a clustering procedure based on the concept of Bayesian model-averaging and a precise statistical model of expression data. RESULTS: We developed a clustering procedure based on the Bayesian infinite mixture model and applied it to clustering gene expression profiles. Clusters of genes with similar expression patterns are identified from the posterior distribution of clusterings defined implicitly by the stochastic data-generation model. The posterior distribution of clusterings is estimated by a Gibbs sampler. We summarized the posterior distribution of clusterings by calculating posterior pairwise probabilities of co-expression and used the complete linkage principle to create clusters. This approach has several advantages over usual clustering procedures. The analysis allows for incorporation of a reasonable probabilistic model for generating data. The method does not require specifying the number of clusters and resulting optimal clustering is obtained by averaging over models with all possible numbers of clusters. Expression profiles that are not similar to any other profile are automatically detected, the method incorporates experimental replicates, and it can be extended to accommodate missing data. This approach represents a qualitative shift in the model-based cluster analysis of expression data because it allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. We also demonstrated the importance of incorporating the information on experimental variability into the clustering model. AVAILABILITY: The MS Windows(TM) based program implementing the Gibbs sampler and supplemental material is available at http://homepages.uc.edu/~medvedm/BioinformaticsSupplement.htm CONTACT: medvedm@email.uc.edu  相似文献   

6.
Biclustering extends the traditional clustering techniques by attempting to find (all) subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. Still the real power of this clustering strategy is yet to be fully realized due to the lack of effective and efficient algorithms for reliably solving the general biclustering problem. We report a QUalitative BIClustering algorithm (QUBIC) that can solve the biclustering problem in a more general form, compared to existing algorithms, through employing a combination of qualitative (or semi-quantitative) measures of gene expression data and a combinatorial optimization technique. One key unique feature of the QUBIC algorithm is that it can identify all statistically significant biclusters including biclusters with the so-called ‘scaling patterns’, a problem considered to be rather challenging; another key unique feature is that the algorithm solves such general biclustering problems very efficiently, capable of solving biclustering problems with tens of thousands of genes under up to thousands of conditions in a few minutes of the CPU time on a desktop computer. We have demonstrated a considerably improved biclustering performance by our algorithm compared to the existing algorithms on various benchmark sets and data sets of our own. QUBIC was written in ANSI C and tested using GCC (version 4.1.2) on Linux. Its source code is available at: http://csbl.bmb.uga.edu/∼maqin/bicluster. A server version of QUBIC is also available upon request.  相似文献   

7.
Infectious diseases result in millions of deaths each year. Mechanisms of infection have been studied in detail for many pathogens. However, many questions are relatively unexplored. What are the properties of human proteins that interact with pathogens? Do pathogens interact with certain functional classes of human proteins? Which infection mechanisms and pathways are commonly triggered by multiple pathogens? In this paper, to our knowledge, we provide the first study of the landscape of human proteins interacting with pathogens. We integrate human–pathogen protein–protein interactions (PPIs) for 190 pathogen strains from seven public databases. Nearly all of the 10,477 human-pathogen PPIs are for viral systems (98.3%), with the majority belonging to the human–HIV system (77.9%). We find that both viral and bacterial pathogens tend to interact with hubs (proteins with many interacting partners) and bottlenecks (proteins that are central to many paths in the network) in the human PPI network. We construct separate sets of human proteins interacting with bacterial pathogens, viral pathogens, and those interacting with multiple bacteria and with multiple viruses. Gene Ontology functions enriched in these sets reveal a number of processes, such as cell cycle regulation, nuclear transport, and immune response that participate in interactions with different pathogens. Our results provide the first global view of strategies used by pathogens to subvert human cellular processes and infect human cells. Supplementary data accompanying this paper is available at http://staff.vbi.vt.edu/dyermd/publications/dyer2008a.html.  相似文献   

8.
Liu X  Liu B  Huang Z  Shi T  Chen Y  Zhang J 《PloS one》2012,7(1):e30938

Background

The molecular network sustained by different types of interactions among proteins is widely manifested as the fundamental driving force of cellular operations. Many biological functions are determined by the crosstalk between proteins rather than by the characteristics of their individual components. Thus, the searches for protein partners in global networks are imperative when attempting to address the principles of biology.

Results

We have developed a web-based tool “Sequence-based Protein Partners Search” (SPPS) to explore interacting partners of proteins, by searching over a large repertoire of proteins across many species. SPPS provides a database containing more than 60,000 protein sequences with annotations and a protein-partner search engine in two modes (Single Query and Multiple Query). Two interacting proteins of human FBXO6 protein have been found using the service in the study. In addition, users can refine potential protein partner hits by using annotations and possible interactive network in the SPPS web server.

Conclusions

SPPS provides a new type of tool to facilitate the identification of direct or indirect protein partners which may guide scientists on the investigation of new signaling pathways. The SPPS server is available to the public at http://mdl.shsmu.edu.cn/SPPS/.  相似文献   

9.
MOTIVATION: Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model-based clustering approaches have emerged as statistically well-grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. RESULTS: We have extended an existing algorithm for model-based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for Saccharomyces cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression. AVAILABILITY: GaneSh, a Java package for coclustering, is available under the terms of the GNU General Public License from our website at http://bioinformatics.psb.ugent.be/software  相似文献   

10.
Genomic experiments (e.g. differential gene expression, single-nucleotide polymorphism association) typically produce ranked list of genes. We present a simple but powerful approach which uses protein–protein interaction data to detect sub-networks within such ranked lists of genes or proteins. We performed an exhaustive study of network parameters that allowed us concluding that the average number of components and the average number of nodes per component are the parameters that best discriminate between real and random networks. A novel aspect that increases the efficiency of this strategy in finding sub-networks is that, in addition to direct connections, also connections mediated by intermediate nodes are considered to build up the sub-networks. The possibility of using of such intermediate nodes makes this approach more robust to noise. It also overcomes some limitations intrinsic to experimental designs based on differential expression, in which some nodes are invariant across conditions. The proposed approach can also be used for candidate disease-gene prioritization. Here, we demonstrate the usefulness of the approach by means of several case examples that include a differential expression analysis in Fanconi Anemia, a genome-wide association study of bipolar disorder and a genome-scale study of essentiality in cancer genes. An efficient and easy-to-use web interface (available at http://www.babelomics.org) based on HTML5 technologies is also provided to run the algorithm and represent the network.  相似文献   

11.
Genetic linkage maps are indispensable tools in genetic, genomic and breeding studies. As one of genotyping-by-sequencing methods, RAD-Seq (restriction-site associated DNA sequencing) has gained particular popularity for construction of high-density linkage maps. Current RAD analytical tools are being predominantly used for typing codominant markers. However, no genotyping algorithm has been developed for dominant markers (resulting from recognition site disruption). Given their abundance in eukaryotic genomes, utilization of dominant markers would greatly diminish the extensive sequencing effort required for large-scale marker development. In this study, we established, for the first time, a novel statistical framework for de novo dominant genotyping in mapping populations. An integrated package called RADtyping was developed by incorporating both de novo codominant and dominant genotyping algorithms. We demonstrated the superb performance of RADtyping in achieving remarkably high genotyping accuracy based on simulated and real mapping datasets. The RADtyping package is freely available at http://www2.ouc.edu.cn/mollusk/ detailen.asp?id=727.  相似文献   

12.
13.
The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data.

Availability

URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspxURL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx  相似文献   

14.
Liang Y  Zhang F  Wang J  Joshi T  Wang Y  Xu D 《PloS one》2011,6(7):e21750

Background

Identifying genes with essential roles in resisting environmental stress rates high in agronomic importance. Although massive DNA microarray gene expression data have been generated for plants, current computational approaches underutilize these data for studying genotype-trait relationships. Some advanced gene identification methods have been explored for human diseases, but typically these methods have not been converted into publicly available software tools and cannot be applied to plants for identifying genes with agronomic traits.

Methodology

In this study, we used 22 sets of Arabidopsis thaliana gene expression data from GEO to predict the key genes involved in water tolerance. We applied an SVM-RFE (Support Vector Machine-Recursive Feature Elimination) feature selection method for the prediction. To address small sample sizes, we developed a modified approach for SVM-RFE by using bootstrapping and leave-one-out cross-validation. We also expanded our study to predict genes involved in water susceptibility.

Conclusions

We analyzed the top 10 genes predicted to be involved in water tolerance. Seven of them are connected to known biological processes in drought resistance. We also analyzed the top 100 genes in terms of their biological functions. Our study shows that the SVM-RFE method is a highly promising method in analyzing plant microarray data for studying genotype-phenotype relationships. The software is freely available with source code at http://ccst.jlu.edu.cn/JCSB/RFET/.  相似文献   

15.
16.
17.
A web-based microarray data analysis tool, ArrayOU (freely available at www.bioinformatics.plantbio.ohiou.edu.), has been developed at the Ohio University Genomics Facility for the research and education community to analyze Agilent microarray data. Agilent''s microarray pipeline has gained in popularity as a result of its ease of use and low cost of customized arrays. The current version of the ArrayOU pipeline allows users to visualize, analyze, and annotate microarray data from commercially available and customized Agilent expression arrays and is extendable for further implementations.  相似文献   

18.
19.
TP Lu  CY Lee  MH Tsai  YC Chiu  CK Hsiao  LC Lai  EY Chuang 《PloS one》2012,7(8):e42390

Background

Many prediction tools for microRNA (miRNA) targets have been developed, but inconsistent predictions were observed across multiple algorithms, which can make further analysis difficult. Moreover, the nomenclature of human miRNAs changes rapidly. To address these issues, we developed a web-based system, miRSystem, for converting queried miRNAs to the latest annotation and predicting the function of miRNA by integrating miRNA target gene prediction and function/pathway analyses.

Results

First, queried miRNA IDs were converted to the latest annotated version to prevent potential conflicts resulting from multiple aliases. Next, by combining seven algorithms and two validated databases, potential gene targets of miRNAs and their functions were predicted based on the consistency across independent algorithms and observed/expected ratios. Lastly, five pathway databases were included to characterize the enriched pathways of target genes through bootstrap approaches. Based on the enriched pathways of target genes, the functions of queried miRNAs could be predicted.

Conclusions

MiRSystem is a user-friendly tool for predicting the target genes and their associated pathways for many miRNAs simultaneously. The web server and the documentation are freely available at http://mirsystem.cgm.ntu.edu.tw/.  相似文献   

20.
The innate immune system is an ancient component of host defense. Since innate immunity pathways are well conserved throughout many eukaryotes, immune genes in model animals can be used to putatively identify homologous genes in newly sequenced genomes of non-model organisms. With the initiation of the “i5k” project, which aims to sequence 5,000 insect genomes by 2016, many novel insect genomes will soon become publicly available, yet few annotation resources are currently available for insects. Thus, we developed an online tool called the Insect Innate Immunity Database (IIID) to provide an open access resource for insect immunity and comparative biology research (http://www.vanderbilt.edu/IIID). The database provides users with simple exploratory tools to search the immune repertoires of five insect models (including Nasonia), spanning three orders, for specific immunity genes or genes within a particular immunity pathway. As a proof of principle, we used an initial database with only four insect models to annotate potential immune genes in the parasitoid wasp genus Nasonia. Results specify 306 putative immune genes in the genomes of N. vitripennis and its two sister species N. giraulti and N. longicornis. Of these genes, 146 were not found in previous annotations of Nasonia immunity genes. Combining these newly identified immune genes with those in previous annotations, Nasonia possess 489 putative immunity genes, the largest immune repertoire found in insects to date. While these computational predictions need to be complemented with functional studies, the IIID database can help initiate and augment annotations of the immune system in the plethora of insect genomes that will soon become available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号