首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
CSDBase (http://www.chemie.uni-marburg.de/~csdbase/) is an interactive Internet-embedded research platform providing detailed information on proteins containing the cold shock domain (CSD). It consists of two separated database cores, one dedicated to CSD protein information, and one to provide a powerful resource to relevant literature with emphasis on the bacterial cold shock response. In addition to detailed protein information and useful cross links to other web sites, CSDBase contains computer-generated CSD structure models for most CSD-containing protein sequences available at NCBI non-redundant protein database at the time of CSDBase establishment. These models were calculated on the basis of known crystal and/or NMR structures using SWISS-MODEL and can be downloaded as PDB structure coordinate files for viewing and for manipulation with other software tools. CSDBase will be regularly updated and is organized in a compact form providing user friendly interfaces to both database cores which allow for easy data retrieval.  相似文献   

2.
GOBASE: the organelle genome database   总被引:3,自引:1,他引:2  
  相似文献   

3.
Learning MHC I--peptide binding   总被引:1,自引:0,他引:1  
MOTIVATION AND RESULTS: Motivated by the ability of a simple threading approach to predict MHC I--peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system. AVAILABILITY: http://www.research.microsoft.com/~jojic/hlaBinding.html.  相似文献   

4.
Shi L  Zhang Q  Rui W  Lu M  Jing X  Shang T  Tang J 《Regulatory peptides》2004,120(1-3):1-3
Bioactive peptide database (BioPD) is a web-based knowledge base that contains more than 1100 protein sequences from human, mouse and rat, which are putative or are known to be bioactive peptides. In addition to peptide sequences and the annotation, the database also contains gene sequences with annotation, protein interaction and disease data related to the peptides. Each entry has as many references as possible to support the information represented. BioPD consists of six parts: PROTEIN, GENE, DISEASE, LINKS, INTERACTION, and REFERENCE. The database is searchable through keyword, gene and protein name, receptor name, etc. The links to PDB, InterPro, Pfam, OMIM, etc. are provided in each entry. Thus BioPD is formed as an information center for the bioactive peptide and serves as a gateway for exploration of bioactive peptides. The database can be accessed at http://biopd.bjmu.edu.cn.  相似文献   

5.
Ring theory     
In what follows we demonstrate that the minimum requirement for the formation of a DNA ring is a pair of ordinary (ABC … ABC) or inverted (ABC … C′B′A′) repetitions. DNA fragments that are partly degraded from their ends by a 3′ (or 5′) specific exonuclease such as exonuclease III (or λ exonuelease) produce resected fragments that can only form rings by virtue of ordinary repetitions.Next we analyze how random fragments cut from DNA molecules containing ordinary repetitions would be expected to form rings. Since longer fragments (>5 to 10 μm) cyclize less efficiently than do shorter ones (2 μm), we are led to the view that the chromatid is composed of thousands of distinctive regions, called g-regions, within which characteristic repetitious sequences are clustered in an intermittent or tandem fashion. Mathematical expressions are derived that allow one to measure the length and number of these g-regions from the ring frequency, R, and its dependence on the length of the fragment.The interior organization of the g-regions is considered in terms of two models and their variants: intermittent repetition and tandem repetition. These are depicted in Figure 2. The objective of this effort is to calculate the frequency of rings that can be generated from these two models, and to explain the “shortside fall-off”, that is, the decrease in ring frequency as the fragment length becomes shorter. This could not be due to the stiffness of the DNA double helix and must reflect a distribution of spacing of the repetitious sequences within the g-regions. Mathematical expressions are obtained that allow one to estimate the average values of the repetitive or partly repetitive unit. These estimates may be obtained from the dependence of ring frequency on the extent of resection, and from the dependence of ring frequency on the length of shorter fragments.The mathematical expressions derived here are employed in the previous papers of this group, and lead to the conclusion that the g-regions are composed of tandemly repeating sequences.  相似文献   

6.
Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. Analyzing their evolution and their distribution among various phyla could certainly help to elucidate the mystery of their extremely widespread and diversified presence in almost all living organisms. PeroxiBase was originally created for the exhaustive collection of class III peroxidase sequences from plants (Bakalovic, N., Passardi, F., et al., 2006. PeroxiBase: a class III plant peroxidase database. Phytochemistry 67, 534-539). The extension of the class III peroxidase database to all proteins capable to reduce peroxide molecules appears as a necessity. Our database contains haem and non-haem peroxidase sequences originated from annotated or not correctly annotated sequences deposited in the main repositories such as GenBank or UniProt KnowledgeBase. This new database will allow obtaining a global overview of the evolution the protein families and superfamilies capable of peroxidase reaction. In this rapidly growing field, there is a need for continual updates and corrections of the peroxidase protein sequences. Following the lack of unified nomenclature, we also introduced a unique abbreviation for each different family of peroxidases. This paper thus aims to report the evolution of the PeroxiBase database, which is freely accessible through a web server (http://peroxibase.isb-sib.ch). In addition to new categories of peroxidases, new specific tools have been created to facilitate query, classification and submission of peroxidase sequences.  相似文献   

7.
The observed frequency of folded rings has been determined as a function of fragment length and degree of resection for DNA from mouse and Necturus. The thermal stability of the ring closure and the kinetics of ring formation have been studied. As seen in the case of Drosophila DNA, mouse and Necturus DNA display a decreasing frequency of folded rings as fragment length increases. We interpret this to mean that repetitious sequences of a given type are clustered into many thousands of characteristic regions, called g-regions. The present paper focuses on the interior organization of g-regions. Variations of two competing models may be entertained: “tandem repetition” and “intermittent repetition”. If the g-regions were composed of exact, tandemly-repeating sequences, all observations can be easily explained. In order to maintain the idea that the g-regions contain repetitious blocks located at regular, or irregular intervals, one must suppose that such repetitious blocks are long (>200 nucleotide pairs), not internally repetitious, and represent perhaps 80% of the nucleotides in the g-region. Such a sequence can be thought of as a fractional-tandem repeat. For example: HIJXXXABC … HIJXXXABC … HIJXXX, where the X's stand for nucleotides composing sequences that are unrelated to each other, and the letters (ABC … HIJ) represent nucleotides in the non-internally-repetitive repeating sequence. We feel that debate cart now be profitably devoted to the question of whether approximately 80 or 100% of the tandemly-repetitious unit is in fact tandem.  相似文献   

8.
9.
SUMMARY: The Viral Genome DataBase (VGDB) contains detailed information of the genes and predicted protein sequences from 15 completely sequenced genomes of large (&100 kb) viruses (2847 genes). The data that is stored includes DNA sequence, protein sequence, GenBank and user-entered notes, molecular weight (MW), isoelectric point (pI), amino acid content, A + T%, nucleotide frequency, dinucleotide frequency and codon use. The VGDB is a mySQL database with a user-friendly JAVA GUI. Results of queries can be easily sorted by any of the individual parameters. AVAILABILITY: The software and additional figures and information are available at http://athena.bioc.uvic.ca/genomes/index.html .  相似文献   

10.
The Nef protein from human or simian immunodeficiency virus enhances viral replication, downregulates immune cell receptors, and activates multiple host cell signaling pathways. Conformational information about full-length Nef has been difficult to obtain as the full-length protein is not readily amenable to NMR or X-ray crystallography due to aggregation at high concentrations. As an alternative, full-length HIV and SIV Nef were probed with hydrogen exchange mass spectrometry, a method compatible with the low concentration requirements of Nef. The results showed that HIV Nef contains a solvent-protected core, as previously demonstrated with both NMR and X-ray crystallography. SIV Nef, for which there is no structural information, had a similar protected core, although it was more flexible and dynamic than its HIV counterpart. Many of the regions outside the core in both SIV and HIV Nef were highly solvent exposed. However, limited protection from exchange was observed in both N- and C-terminal regions, suggesting the presence of structured elements. Protection from exchange was also observed in a large loop emanating from the core that was deleted for NMR and X-ray analysis. These data show that while the majority of Nef was highly solvent exposed, regions outside the core may have structural attributes which may contribute to Nef functions known to map to these regions.  相似文献   

11.
MOTIVATION: Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. METHODS: This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. AVAILABILITY: Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.  相似文献   

12.
13.
OWL--a non-redundant composite protein sequence database.   总被引:5,自引:1,他引:4       下载免费PDF全文
A comprehensive, non-redundant composite protein sequence database is described. The database, OWL, is an amalgam of data from six publicly-available primary sources, and is generated using strict redundancy criteria. The database is updated monthly and its size has increased almost eight-fold in the last six years: the current version contains > 76,000 entries. For added flexibility, OWL is distributed with a tailor-made query language, together with a number of programs for database exploration, information retrieval and sequence analysis, which together form an integrated database and software resource for protein sequences.  相似文献   

14.
15.
MOTIVATION:Aligning multiple proteins based on sequence information alone is challenging if sequence identity is low or there is a significant degree of structural divergence. We present a novel algorithm (SATCHMO) that is designed to address this challenge. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree. The alignment at a given node contains all sequences within its sub-tree, and predicts which positions in those sequences are alignable and which are not. Aligned regions therefore typically get shorter on a path from a leaf to the root as sequences diverge in structure. Current methods either regard all positions as alignable (e.g. ClustalW), or align only those positions believed to be homologous across all sequences (e.g. profile HMM methods); by contrast SATCHMO makes different predictions of alignable regions in different subgroups. SATCHMO generates profile hidden Markov models at each node; these are used to determine branching order, to align sequences and to predict structurally alignable regions. RESULTS: In experiments on the BAliBASE benchmark alignment database, SATCHMO is shown to perform comparably to ClustalW and the UCSC SAM HMM software. Results using SATCHMO to identify protein domains are demonstrated on potassium channels, with implications for the mechanism by which tumor necrosis factor alpha affects potassium current. AVAILABILITY: The software is available for download from http://www.drive5.com/lobster/index.htm  相似文献   

16.
17.
18.
Next‐generation sequencing technologies are extensively used in the field of molecular microbial ecology to describe taxonomic composition and to infer functionality of microbial communities. In particular, the so‐called barcode or metagenetic applications that are based on PCR amplicon library sequencing are very popular at present. One of the problems, related to the utilization of the data of these libraries, is the analysis of reads quality and removal (trimming) of low‐quality segments, while retaining sufficient information for subsequent analyses (e.g. taxonomic assignment). Here, we present StreamingTrim, a DNA reads trimming software, written in Java, with which researchers are able to analyse the quality of DNA sequences in fastq files and to search for low‐quality zones in a very conservative way. This software has been developed with the aim to provide a tool capable of trimming amplicon library data, retaining as much as taxonomic information as possible. This software is equipped with a graphical user interface for a user‐friendly usage. Moreover, from a computational point of view, StreamingTrim reads and analyses sequences one by one from an input fastq file, without keeping anything in memory, permitting to run the computation on a normal desktop PC or even a laptop. Trimmed sequences are saved in an output file, and a statistics summary is displayed that contains the mean and standard deviation of the length and quality of the whole sequence file. Compiled software, a manual and example data sets are available under the BSD‐2‐Clause License at the GitHub repository at https://github.com/GiBacci/StreamingTrim/ .  相似文献   

19.
Histone Sequence Database: new histone fold family members.   总被引:2,自引:0,他引:2       下载免费PDF全文
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

20.
The challenge of translating the huge amount of genomic and biochemical data into new drugs is a costly and challenging task. Historically, there has been comparatively little focus on linking the biochemical and chemical worlds. To address this need, we have developed ChEMBL, an online resource of small-molecule SAR (structure-activity relationship) data, which can be used to support chemical biology, lead discovery and target selection in drug discovery. The database contains the abstracted structures, properties and biological activities for over 700000 distinct compounds and in excess of more than 3 million bioactivity records abstracted from over 40000 publications. Additional public domain resources can be readily integrated into the same data model (e.g. PubChem BioAssay data). The compounds in ChEMBL are largely extracted from the primary medicinal chemistry literature, and are therefore usually 'drug-like' or 'lead-like' small molecules with full experimental context. The data cover a significant fraction of the discovery of modern drugs, and are useful in a wide range of drug design and discovery tasks. In addition to the compound data, ChEMBL also contains information for over 8000 protein, cell line and whole-organism 'targets', with over 4000 of those being proteins linked to their underlying genes. The database is searchable both chemically, using an interactive compound sketch tool, protein sequences, family hierarchies, SMILES strings, compound research codes and key words, and biologically, using a variety of gene identifiers, protein sequence similarity and protein families. The information retrieved can then be readily filtered and downloaded into various formats. ChEMBL can be accessed online at https://www.ebi.ac.uk/chembldb.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号