首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gene expression patterns can reflect gene regulations in human tissues under normal or pathologic conditions. Gene expression profiling data from studies of primary human disease samples are particularly valuable since these studies often span many years in order to collect patient clinical information and achieve a large sample size. Disease-to-Gene Expression Mapper (DGEM) provides a beneficial community resource to access and analyze these data; it currently includes Affymetrix oligonucleotide array datasets for more than 40 human diseases and 1400 samples. The data are normalized to the same scale and stored in a relational database. A statistical-analysis pipeline was implemented to identify genes abnormally expressed in disease tissues or genes whose expressions are associated with clinical parameters such as cancer patient survival. Data-mining results can be queried through a web-based interface at http://dgem.dhcp.iupui.edu/. The query tool enables dynamic generation of graphs and tables that are further linked to major gene and pathway resources that connect the data to relevant biology, including Entrez Gene and Kyoto Encyclopedia of Genes and Genomes (KEGG). In summary, DGEM provides scientists and physicians a valuable tool to study disease mechanisms, to discover potential disease biomarkers for diagnosis and prognosis, and to identify novel gene targets for drug discovery. The source code is freely available for non-profit use, on request to the authors.  相似文献   

2.
SUMMARY: Dragon Promoter Mapper (DPM) is a tool to model promoter structure of co-regulated genes using methodology of Bayesian networks. DPM exploits an exhaustive set of motif features (such as motif, its strand, the order of motif occurrence and mutual distance between the adjacent motifs) and generates models from the target promoter sequences, which may be used to (1) detect regions in a genomic sequence which are similar to the target promoters or (2) to classify other promoters as similar or not to the target promoter group. DPM can also be used for modelling of enhancers and silencers. AVAILABILITY: http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/ CONTACT: vlad@sanbi.ac.za SUPPLEMENTARY INFORMATION: Manual for using DPM web server is provided at http://defiant.i2r.a-star.edu.sg/projects/BayesPromoter/html/manual/manual.htm.  相似文献   

3.
Retroviral integration has been implicated in several biomedical applications, including identification of cancer-associated genes and malignant transformation in gene therapy clinical trials. We introduce an efficient and scalable method for fast identification of viral vector integration sites from long read high-throughput sequencing. Individual sequence reads are masked to remove non-genomic sequence, aligned to the host genome and assembled into contiguous fragments used to pinpoint the position of integration. AVAILABILITY AND IMPLEMENTATION: The method is implemented in a publicly accessible web server platform, SeqMap 2.0, containing analysis tools and both private and shared lab workspaces that facilitate collaboration among researchers. Available at http://seqmap.compbio.iupui.edu/.  相似文献   

4.
5.
The Ribosomal Database Project.   总被引:79,自引:0,他引:79       下载免费PDF全文
The Ribosomal Database Project (RDP) is a curated database that offers ribosome-related data, analysis services, and associated computer programs. The offerings include phylogenetically ordered alignments of ribosomal RNA (rRNA) sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (rdp.life.uiuc.edu), electronic mail (server/rdp.life.uiuc.edu) and gopher (rdpgopher.life.uiuc.edu). The electronic mail server also provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for chimeric nature of newly sequenced rRNAs, and automated alignment.  相似文献   

6.
A number of complementary methods have been developed for predicting protein-protein interaction sites. We sought to increase prediction robustness and accuracy by combining results from different predictors, and report here a meta web server, meta-PPISP, that is built on three individual web servers: cons-PPISP (http://pipe.scs.fsu.edu/ppisp.html), Promate (http://bioportal.weizmann.ac.il/promate), and PINUP (http://sparks.informatics.iupui.edu/PINUP/). A linear regression method, using the raw scores of the three servers as input, was trained on a set of 35 nonhomologous proteins. Cross validation showed that meta-PPISP outperforms all the three individual servers. At coverages identical to those of the individual methods, the accuracy of meta-PPISP is higher by 4.8 to 18.2 percentage points. Similar improvements in accuracy are also seen on CAPRI and other targets. AVAILABILITY: meta-PPISP can be accessed at http://pipe.scs.fsu.edu/meta-ppisp.html  相似文献   

7.
YODA: selecting signature oligonucleotides   总被引:3,自引:0,他引:3  
MOTIVATION: Selecting oligonucleotide probes for use in microarray design, and other applications requiring signature sequences, involves identifying sequences which will bind strongly to their intended target, while binding only weakly (or preferably, not at all) to non-target sequences which may be present in the hybridization reaction. While many tools to assist in selection of such sequences exist, all the ones we examined lack important oligo design and software features. RESULTS: YODA is an application for assisting biological researchers in selecting signature sequences. It incorporates a custom sequence similarity search to find potential cross-hybridizing non-target sequences. For this task, most oligo design tools rely on BLAST, which is ill suited for it due to an unacceptable risk of false negatives. YODA supports multiple probe design goals including single-genome, multiple-genome, pathogen-host and species/strain-identification. A graphical interface is provided as well as a command-line interface, both of which support many user-controlled parameters. YODA is easy to install and use and runs on Windows, Mac OS X and Linux platforms. AVAILABILITY: Freely available (LGLP) along with source code and additional documentation at http://pathport.vbi.vt.edu/YODA CONTACT: enordber@vbi.vt.edu.  相似文献   

8.
9.
Liu S  Zhang C  Liang S  Zhou Y 《Proteins》2007,68(3):636-645
Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on http://sparks.informatics.iupui.edu/SP4.  相似文献   

10.
Abstract

Short and long disordered regions of proteins have different preference for different amino acid residues. Different methods often have to be trained to predict them separately. In this study, we developed a single neural-network-based technique called SPINE-D that makes a three-state prediction first (ordered residues and disordered residues in short and long disordered regions) and reduces it into a two-state prediction afterwards. SPINE-D was tested on various sets composed of different combinations of Disprot annotated proteins and proteins directly from the PDB annotated for disorder by missing coordinates in X-ray determined structures. While disorder annotations are different according to Disprot and X-ray approaches, SPINE-D's prediction accuracy and ability to predict disorder are relatively independent of how the method was trained and what type of annotation was employed but strongly depend on the balance in the relative populations of ordered and disordered residues in short and long disordered regions in the test set. With greater than 85% overall specificity for detecting residues in both short and long disordered regions, the residues in long disordered regions are easier to predict at 81% sensitivity in a balanced test dataset with 56.5% ordered residues but more challenging (at 65% sensitivity) in a test dataset with 90% ordered residues. Compared to eleven other methods, SPINE-D yields the highest area under the curve (AUC), the highest Mathews correlation coefficient for residue-based prediction, and the lowest mean square error in predicting disorder contents of proteins for an independent test set with 329 proteins. In particular, SPINE-D is comparable to a meta predictor in predicting disordered residues in long disordered regions and superior in short disordered regions. SPINE-D participated in CASP 9 blind prediction and is one of the top servers according to the official ranking. In addition, SPINE-D was examined for prediction of functional molecular recognition motifs in several case studies. The server and databases are available at http://sparks.informatics.iupui.edu/.  相似文献   

11.
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server.  相似文献   

12.
DATF: a database of Arabidopsis transcription factors   总被引:10,自引:0,他引:10  
Guo A  He K  Liu D  Bai S  Gu X  Wei L  Luo J 《Bioinformatics (Oxford, England)》2005,21(10):2568-2569
  相似文献   

13.
Zuker M 《Nucleic acids research》2003,31(13):3406-3415
The abbreviated name, 'mfold web server', describes a number of closely related software applications available on the World Wide Web (WWW) for the prediction of the secondary structure of single stranded nucleic acids. The objective of this web server is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large. By making use of universally available web GUIs (Graphical User Interfaces), the server circumvents the problem of portability of this software. Detailed output, in the form of structure plots with or without reliability information, single strand frequency plots and 'energy dot plots', are available for the folding of single sequences. A variety of 'bulk' servers give less information, but in a shorter time and for up to hundreds of sequences at once. The portal for the mfold web server is http://www.bioinfo.rpi.edu/applications/mfold. This URL will be referred to as 'MFOLDROOT'.  相似文献   

14.
15.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

16.
Dor O  Zhou Y 《Proteins》2007,68(1):76-81
Proteins can move freely in three-dimensional space. As a result, their structural properties, such as solvent accessible surface area, backbone dihedral angles, and atomic distances, are continuous variables. However, these properties are often arbitrarily divided into a few classes to facilitate prediction by statistical learning techniques. In this work, we establish an integrated system of neural networks (called Real-SPINE) for real-value prediction and apply the method to predict residue-solvent accessibility and backbone psi dihedral angles of proteins based on information derived from sequences only. Real-SPINE is trained with a large data set of 2640 protein chains, sequence profiles generated from multiple sequence alignment, representative amino-acid properties, a slow learning rate, overfitting protection, and predicted secondary structures. The method optimizes more than 200,000 weights and yields a 10-fold cross-validated Pearson's correlation coefficient (PCC) of 0.74 between predicted and actual solvent accessible surface areas and 0.62 between predicted and actual psi angles. In particular, 90% of 2640 proteins have a PCC value greater than 0.6 between predicted and actual solvent-accessible surface areas. The results of Real-SPINE can be compared with the best reported correlation coefficients of 0.64-0.67 for solvent-accessible surface areas and 0.47 for psi angles. The real-SPINE server, executable programs, and datasets are freely available on http://sparks.informatics.iupui.edu.  相似文献   

17.
18.
Artificially synthesized short interfering RNAs (siRNAs) are widely used in functional genomics to knock down specific target genes. One ongoing challenge is to guarantee that the siRNA does not elicit off-target effects. Initial reports suggested that siRNAs were highly sequence-specific; however, subsequent data indicates that this is not necessarily the case. It is still uncertain what level of similarity and other rules are required for an off-target effect to be observed, and scoring schemes have not been developed to look beyond simple measures such as the number of mismatches or the number of consecutive matching bases present. We created design rules for predicting the likelihood of a non-specific effect and present a web server that allows the user to check the specificity of a given siRNA in a flexible manner using a combination of methods. The server finds potential off-target matches in the corresponding RefSeq database and ranks them according to a scoring system based on experimental studies of specificity. AVAILABILITY: The server is available at http://informatics-eskitis.griffith.edu.au/SpecificityServer.  相似文献   

19.
Xue B  Dor O  Faraggi E  Zhou Y 《Proteins》2008,72(1):427-433
The backbone structure of a protein is largely determined by the phi and psi torsion angles. Thus, knowing these angles, even if approximately, will be very useful for protein-structure prediction. However, in a previous work, a sequence-based, real-value prediction of psi angle could only achieve a mean absolute error of 54 degrees (83 degrees, 35 degrees, 33 degrees for coil, strand, and helix residues, respectively) between predicted and actual angles. Moreover, a real-value prediction of phi angle is not yet available. This article employs a neural-network based approach to improve psi prediction by taking advantage of angle periodicity and apply the new method to the prediction to phi angles. The 10-fold-cross-validated mean absolute error for the new method is 38 degrees (58 degrees, 33 degrees, 22 degrees for coil, strand, and helix, respectively) for psi and 25 degrees (35 degrees, 22 degrees, 16 degrees for coil, strand, and helix, respectively) for phi. The accuracy of real-value prediction is comparable to or more accurate than the predictions based on multistate classification of the phi-psi map. More accurate prediction of real-value angles will likely be useful for improving the accuracy of fold recognition and ab initio protein-structure prediction. The Real-SPINE 2.0 server is available on the website http://sparks.informatics.iupui.edu.  相似文献   

20.
SUMMARY: Each organism has traits that are shared with some, but not all, organisms. Identification of genes needed for a particular trait can be accomplished by a comparative genomics approach using three or more organisms. Genes that occur in organisms without the trait are removed from the set of genes in common among organisms with the trait. To facilitate these comparisons, a web-based server, Procom, was developed to identify the subset of genes that may be needed for a trait. AVAILABILITY: The Procom program is freely available with documentation and examples at http://ural.wustl.edu/~billy/Procom/ CONTACT: billy@ural.wustl.edu.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号