首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice.  相似文献   

2.

Background  

Databases for either sequence, annotation, or microarray experiments data are extremely beneficial to the research community, as they centrally gather information from experiments performed by different scientists. However, data from different sources develop their full capacities only when combined. The idea of a data warehouse directly adresses this problem and solves it by integrating all required data into one single database – hence there are already many data warehouses available to genetics. For the model legume Medicago truncatula, there is currently no such single data warehouse that integrates all freely available gene sequences, the corresponding gene expression data, and annotation information. Thus, we created the data warehouse TRUNCATULIX, an integrative database of Medicago truncatula sequence and expression data.  相似文献   

3.
Aims The Cape Peninsula is a small area (471 km 2) situated on the south-westernmost tip of the Core Cape Subregion (CCR) of South Africa. Within the Cape Peninsula, Fabaceae are the third most species-rich plant family (162 species) and they have the second highest number of endemic species after the Ericaceae. However, legumes are not the dominant taxa in the vegetation. They tend to show patchy distributions within the landscape and different species assemblages usually occupy particular niches at any given locality. The present study undertook to establish if edaphic factors influence legume species distribution in the Cape Peninsula and to determine the key indicator species for the different assemblages.Methods Soils from 27 legume sites, spanning all major geological substrates of the Cape Peninsula, were analysed for 31 chemical and physical properties. Legume species present at each site were recorded and a presence/absence matrix was generated. Cluster analysis and discriminant function analysis (DFA) were run to group the sites based on overall similarity in edaphic characteristics and to identify the soil parameters contributing towards discriminating the groups. Canonical correspondence analysis (CCA) was used to test for a correlation between legume species compositions and edaphic factors. The strength of the association between legume species and site groupings based on edaphic properties was assessed using indicator species analysis.Important findings Based on similarity in overall soil characteristics, the sites formed three clusters: one comprising sites of sandstone geology, one with dune sand sites and the third cluster comprising sites of both shale and granite geologies (hereafter referred to as soil types). The DFA confirmed the distinctness of these clusters and the CCA showed a significant correlation between legume species composition and edaphic factors. The key edaphic parameters were clay content, iron (Fe), potassium (K), sulphur (S) and zinc (Zn). These findings reveal that the Cape Peninsula is edaphically heterogeneous and edaphically distinct habitats contain discrete legume species assemblages that can be distinguished by unique indicator species. Furthermore, multiple soil parameters, rather than a single parameter, are involved. Therefore, edaphic factors play a significant role in driving the distribution of legume species in the Cape Peninsula and discrete legume species assemblages occupy distinct habitats.  相似文献   

4.
Research in avian blood parasites has seen a remarkable increase since the introduction of polymerase chain reaction-based methods for parasite identification. New data are revealing complex multihost-multiparasite systems which are difficult to understand without good knowledge of the host range and geographical distribution of the parasite lineages. However, such information is currently difficult to obtain from the literature, or from general repositories such as GenBank, mainly because (i) different research groups use different parasite lineage names, (ii) GenBank entries frequently refer only to the first host and locality at which each parasite was sampled, and (iii) different researchers use different gene fragments to identify parasite lineages. We propose a unified database of avian blood parasites of the genera Plasmodium, Haemoproteus and Leucocytozoon identified by a partial region of their cytochrome b sequences. The database uses a standardized nomenclature to remove synonymy, and concentrates all available information about each parasite in a public reference site, thereby facilitating access to all researchers. Initial data include a list of host species and localities, as well as genetic markers that can be used for phylogenetical analyses. The database is free to download and will be regularly updated by the authors. Prior to publication of new lineages, we encourage researchers to assign names to match the existing database. We anticipate that the value of the database as a source for determining host range and geographical distribution of the parasites will grow with its size and substantially enhance the understanding of this remarkably diverse group of parasites.  相似文献   

5.
The Virtual Hybridization approach predicts the most probable hybridization sites across a target nucleic acid of known sequence, including both perfect and mismatched pairings. Potential hybridization sites, having a user-defined minimum number of bases that are paired with the oligonucleotide probe, are first identified. Then free energy values are evaluated for each potential hybridization site, and if it has a calculated free energy of equal or higher negative value than a user-defined free energy cut-off value, it is considered as a site of high probability of hybridization. The Universal Fingerprinting Chip Applications Server contains the software for visualizing predicted hybridization patterns, which yields a simulated hybridization fingerprint that can be compared with experimentally derived fingerprints or with a virtual fingerprint arising from a different sample. AVAILABILITY: The database is available for free at http://bioinformatica.homelinux.org/UFCVH/  相似文献   

6.
The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These omissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt Extensible Markup Language (XML) files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu .  相似文献   

7.
8.
9.
MOTIVATION: High-throughput technologies create the necessity to mine large amounts of gene annotations from diverse databanks, and to integrate the resulting data. Most databanks can be interrogated only via Web, for a single gene at a time, and query results are generally available only in the HTML format. Although some databanks provide batch retrieval of data via FTP, this requires expertise and resources for locally reimplementing the databank. RESULTS: We developed MyWEST, a tool aimed at researchers without extensive informatics skills or resources, which exploits user-defined templates to easily mine selected annotations from different Web-interfaced databanks, and aggregates and structures results in an automatically updated database. Using microarray results from a model system of retinoic acid-induced differentiation, MyWEST effectively gathered relevant annotations from various biomolecular databanks, highlighted significant biological characteristics and supported a global approach to the understanding of complex cellular mechanisms. AVAILABILITY: MyWEST is freely available for non-profit use at http://www.medinfopoli.polimi.it/MyWEST/  相似文献   

10.
We report here the release of a web-based tool (MDDNA) to study and model the fine structural details of DNA on the basis of data extracted from a set of molecular dynamics (MD) trajectories of DNA sequences involving all the unique tetranucleotides. The dynamic web interface can be employed to analyze the first neighbor sequence context effects on the 10 unique dinucleotide steps of DNA. Functionality is included to build all atom models of any user-defined sequence based on the MD results. The backend of this interface is a relational database storing the conformational details of DNA obtained in 39 different MD simulation trajectories comprising all the 136 unique tetranucleotide steps. Examples of the use of this data to predict DNA structures are included. Availability: http://humphry.chem.wesleyan.edu:8080/MDDNA. Supplementary information: Supplementary data including color figures are available at Bioinformatics online.  相似文献   

11.
12.
Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html.  相似文献   

13.
WebACT--an online companion for the Artemis Comparison Tool   总被引:4,自引:0,他引:4  
SUMMARY: WebACT is an online resource which enables the rapid provision of simultaneous BLAST comparisons between up to five genomic sequences in a format amenable for visualization with the well-known Artemis Comparison Tool (ACT). Comparisons can be generated on-the-fly using sequences directly retrieved via EMBL database queries, or by entering or uploading user sequences. Furthermore, pre-computed comparisons are available between all publicly available, completed prokaryotic genomes and plasmids currently contained within the Genome Reviews database (372 sequences, representing 175 different species). The system is designed to minimize the volume of downloaded data and maximize performance. Genome sequences, annotation and pre-computed comparisons are stored in a relational database allowing flexible querying based on user-defined sequence regions, from whole genome to a defined region flanking a specified gene. Comparison and sequence files, whether computed online or retrieved from the database of pre-computed genome comparisons, can be viewed online using ACT and are available for download. AVAILABILITY: Freely accessible at http://www.webact.org. SUPPLEMENTARY INFORMATION: User guide and worked examples are available at http://www.webact.org/WebACT/docs.  相似文献   

14.
15.
16.
ProTherm 2.0 is the second release of the Thermo-dynamic Database for Proteins and Mutants that includes numerical data for several thermodynamic parameters, structural information, experimental methods and conditions, functional and literature information. The present release contains >5500 entries, an approximately 67% increase over the previous version. In addition, we have included information about reversibility of data, details about buffer and ion concentrations and the surrounding residues in space for all mutants. A WWW interface enables users to search data based on various conditions with different sorting options for outputs. Further, ProTherm has links with other structural and literature databases, and the mutation sites and surrounding residues are automatically mapped on the structures and can be directly viewed through 3DinSight developed in our laboratory. The ProTherm database is freely available through the WWW at http://www.rtc.riken.go.jp/protherm.html  相似文献   

17.
18.
Actinobase is a relational database of molecular diversity, phylogeny and biocatalytic potential of haloalkaliphilic actinomycetes. The main objective of this data base is to provide easy access to range of information, data storage, comparison and analysis apart from reduced data redundancy, data entry, storage, retrieval costs and improve data security. Information related to habitat, cell morphology, Gram reaction, biochemical characterization and molecular features would allow researchers in understanding identification and stress adaptation of the existing and new candidates belonging to salt tolerant alkaliphilic actinomycetes. The PHP front end helps to add nucleotides and protein sequence of reported entries which directly help researchers to obtain the required details. Analysis of the genus wise status of the salt tolerant alkaliphilic actinomycetes indicated 6 different genera among the 40 classified entries of the salt tolerant alkaliphilic actinomycetes. The results represented wide spread occurrence of salt tolerant alkaliphilic actinomycetes belonging to diverse taxonomic positions. Entries and information related to actinomycetes in the database are publicly accessible at http://www.actinobase.in. On clustalW/X multiple sequence alignment of the alkaline protease gene sequences, different clusters emerged among the groups. The narrow search and limit options of the constructed database provided comparable information. The user friendly access to PHP front end facilitates would facilitate addition of sequences of reported entries. AVAILABILITY: The database is available for free at http://www.actinobase.in.  相似文献   

19.
Next-Generation Sequencing (NGS) technologies have dramatically revolutionised research in many fields of genetics. The ability to sequence many individuals from one or multiple populations at a genomic scale has greatly enhanced population genetics studies and made it a data-driven discipline. Recently, researchers have proposed statistical modelling to address genotyping uncertainty associated with NGS data. However, an ongoing debate is whether it is more beneficial to increase the number of sequenced individuals or the per-sample sequencing depth for estimating genetic variation. Through extensive simulations, I assessed the accuracy of estimating nucleotide diversity, detecting polymorphic sites, and predicting population structure under different experimental scenarios. Results show that the greatest accuracy for estimating population genetics parameters is achieved by employing a large sample size, despite single individuals being sequenced at low depth. Under some circumstances, the minimum sequencing depth for obtaining accurate estimates of allele frequencies and to identify polymorphic sites is , where both alleles are more likely to have been sequenced. On the other hand, inferences of population structure are more accurate at very large sample sizes, even with extremely low sequencing depth. This all points to the conclusion that under various experimental scenarios, in cost-limited population genetics studies, large sample sizes at low sequencing depth are desirable to achieve high accuracy. These findings will help researchers design their experimental set-ups and guide further investigation on the effect of protocol design for genetic research.  相似文献   

20.
Microsatellites are widely distributed throughout nearly all genomes which have been extensively exploited as powerful genetic markers for diverse applications due to their high polymorphisms. Their length variations are involved in gene regulation and implicated in numerous genetic diseases even in cancers. Although much effort has been devoted in microsatellite database construction, the existing microsatellite databases still had some drawbacks, such as limited number of species, unfriendly export format, missing marker development, lack of compound microsatellites and absence of gene annotation, which seriously restricted researchers to perform downstream analysis. In order to overcome the above limitations, we developed PSMD (Pan‐Species Microsatellite Database, http://big.cdu.edu.cn/psmd/ ) as a web‐based database to facilitate researchers to easily identify microsatellites, exploit reliable molecular markers and compare microsatellite distribution pattern on genome‐wide scale. In current release, PSMD comprises 678,106,741 perfect microsatellites and 43,848,943 compound microsatellites from 18,408 organisms, which covered almost all species with available genomic data. In addition to interactive browse interface, PSMD also offers a flexible filter function for users to quickly gain desired microsatellites from large data sets. PSMD allows users to export GFF3 formatted file and CSV formatted statistical file for downstream analysis. We also implemented an online tool for analysing occurrence of microsatellites with user‐defined parameters. Furthermore, Primer3 was embedded to help users to design high‐quality primers with customizable settings. To our knowledge, PSMD is the most extensive resource which is likely to be adopted by scientists engaged in biological, medical, environmental and agricultural research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号