期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GMEnzy: A Genetically Modified Enzybiotic Database

Hongyu Wu Jinjiang Huang Hairong Lu Guodong Li Qingshan Huang 《PloS one》2014,9(8)

GMEs are genetically modified enzybiotics created through molecular engineering approaches to deal with the increasing problem of antibiotic resistance prevalence. We present a fully manually curated database, GMEnzy, which focuses on GMEs and their design strategies, production and purification methods, and biological activity data. GMEnzy collects and integrates all available GMEs and their related information into one web based database. Currently GMEnzy holds 186 GMEs from published literature. The GMEnzy interface is easy to use, and allows users to rapidly retrieve data according to desired search criteria. GMEnzy’s construction will increase the efficiency and convenience of improving these bioactive proteins for specific requirements, and will expand the arsenal available for researches to control drug-resistant pathogens. This database will prove valuable for researchers interested in genetically modified enzybiotics studies. GMEnzy is freely available on the Web at http://biotechlab.fudan.edu.cn/database/gmenzy/. 相似文献

2.

MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

Jiyuan Hu Tengfei Li Zidi Xiu Hong Zhang 《PloS one》2015,10(8)

Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. 相似文献

3.

ThioFinder: A Web-Based Tool for the Identification of Thiopeptide Gene Clusters in DNA Sequences

Jing Li Xudong Qu Xinyi He Lian Duan Guojun Wu Dexi Bi Zixin Deng Wen Liu Hong-Yu Ou 《PloS one》2012,7(9)

Thiopeptides are a growing class of sulfur-rich, highly modified heterocyclic peptides that are mainly active against Gram-positive bacteria including various drug-resistant pathogens. Recent studies also reveal that many thiopeptides inhibit the proliferation of human cancer cells, further expanding their application potentials for clinical use. Thiopeptide biosynthesis shares a common paradigm, featuring a ribosomally synthesized precursor peptide and conserved posttranslational modifications, to afford a characteristic core system, but differs in tailoring to furnish individual members. Identification of new thiopeptide gene clusters, by taking advantage of increasing information of DNA sequences from bacteria, may facilitate new thiopeptide discovery and enrichment of the unique biosynthetic elements to produce novel drug leads by applying the principle of combinatorial biosynthesis. In this study, we have developed a web-based tool ThioFinder to rapidly identify thiopeptide biosynthetic gene cluster from DNA sequence using a profile Hidden Markov Model approach. Fifty-four new putative thiopeptide biosynthetic gene clusters were found in the sequenced bacterial genomes of previously unknown producing microorganisms. ThioFinder is fully supported by an open-access database ThioBase, which contains the sufficient information of the 99 known thiopeptides regarding the chemical structure, biological activity, producing organism, and biosynthetic gene (cluster) along with the associated genome if available. The ThioFinder website offers researchers a unique resource and great flexibility for sequence analysis of thiopeptide biosynthetic gene clusters. ThioFinder is freely available at http://db-mml.sjtu.edu.cn/ThioFinder/. 相似文献

4.

DBGC: A Database of Human Gastric Cancer

Chao Wang Jun Zhang Mingdeng Cai Zhenggang Zhu Wenjie Gu Yingyan Yu Xiaoyan Zhang 《PloS one》2015,10(11)

相似文献

5.

iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition

Yongchun Zuo Yang Lv Zhuying Wei Lei Yang Guangpeng Li Guoliang Fan 《PloS one》2015,10(12)

Defensins as one of the most abundant classes of antimicrobial peptides are an essential part of the innate immunity that has evolved in most living organisms from lower organisms to humans. To identify specific defensins as interesting antifungal leads, in this study, we constructed a more rigorous benchmark dataset and the iDPF-PseRAAAC server was developed to predict the defensin family and subfamily. Using reduced dipeptide compositions were used, the overall accuracy of proposed method increased to 95.10% for the defensin family, and 98.39% for the vertebrate subfamily, which is higher than the accuracy from other methods. The jackknife test shows that more than 4% improvement was obtained comparing with the previous method. A free online server was further established for the convenience of most experimental scientists at http://wlxy.imu.edu.cn/college/biostation/fuwu/iDPF-PseRAAAC/index.asp. A friendly guide is provided to describe how to use the web server. We anticipate that iDPF-PseRAAAC may become a useful high-throughput tool for both basic research and drug design. 相似文献

6.

CEG: a database of essential gene clusters

Yuan-Nong Ye Zhi-Gang Hua Jian Huang Nini Rao Feng-Biao Guo 《BMC genomics》2013,14(1)

相似文献

7.

PaGenBase: A Pattern Gene Database for the Global and Dynamic Understanding of Gene Function

Jian-Bo Pan Shi-Chang Hu Dan Shi Mei-Chun Cai Yin-Bo Li Quan Zou Zhi-Liang Ji 《PloS one》2013,8(12)

Pattern genes are a group of genes that have a modularized expression behavior under serial physiological conditions. The identification of pattern genes will provide a path toward a global and dynamic understanding of gene functions and their roles in particular biological processes or events, such as development and pathogenesis. In this study, we present PaGenBase, a novel repository for the collection of tissue- and time-specific pattern genes, including specific genes, selective genes, housekeeping genes and repressed genes. The PaGenBase database is now freely accessible at http://bioinf.xmu.edu.cn/PaGenBase/. In the current version (PaGenBase 1.0), the database contains 906,599 pattern genes derived from the literature or from data mining of more than 1,145,277 gene expression profiles in 1,062 distinct samples collected from 11 model organisms. Four statistical parameters were used to quantitatively evaluate the pattern genes. Moreover, three methods (quick search, advanced search and browse) were designed for rapid and customized data retrieval. The potential applications of PaGenBase are also briefly described. In summary, PaGenBase will serve as a resource for the global and dynamic understanding of gene function and will facilitate high-level investigations in a variety of fields, including the study of development, pathogenesis and novel drug discovery. 相似文献

8.

Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble

Dong-Jun Yu Jun Hu Hui Yan Xi-Bei Yang Jing-Yu Yang Hong-Bin Shen 《BMC bioinformatics》2014,15(1)

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users. 相似文献

9.

TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules

Zhang L Chen Y Wong HS Zhou S Mamitsuka H Zhu S 《PloS one》2012,7(2):e30483

Motivation

Accurate identification of peptides binding to specific Major Histocompatibility Complex Class II (MHC-II) molecules is of great importance for elucidating the underlying mechanism of immune recognition, as well as for developing effective epitope-based vaccines and promising immunotherapies for many severe diseases. Due to extreme polymorphism of MHC-II alleles and the high cost of biochemical experiments, the development of computational methods for accurate prediction of binding peptides of MHC-II molecules, particularly for the ones with few or no experimental data, has become a topic of increasing interest. TEPITOPE is a well-used computational approach because of its good interpretability and relatively high performance. However, TEPITOPE can be applied to only 51 out of over 700 known HLA DR molecules.

Method

We have developed a new method, called TEPITOPEpan, by extrapolating from the binding specificities of HLA DR molecules characterized by TEPITOPE to those uncharacterized. First, each HLA-DR binding pocket is represented by amino acid residues that have close contact with the corresponding peptide binding core residues. Then the pocket similarity between two HLA-DR molecules is calculated as the sequence similarity of the residues. Finally, for an uncharacterized HLA-DR molecule, the binding specificity of each pocket is computed as a weighted average in pocket binding specificities over HLA-DR molecules characterized by TEPITOPE.

Result

The performance of TEPITOPEpan has been extensively evaluated using various data sets from different viewpoints: predicting MHC binding peptides, identifying HLA ligands and T-cell epitopes and recognizing binding cores. Among the four state-of-the-art competing pan-specific methods, for predicting binding specificities of unknown HLA-DR molecules, TEPITOPEpan was roughly the second best method next to NETMHCIIpan-2.0. Additionally, TEPITOPEpan achieved the best performance in recognizing binding cores. We further analyzed the motifs detected by TEPITOPEpan, examining the corresponding literature of immunology. Its online server and PSSMs therein are available at http://www.biokdd.fudan.edu.cn/Service/TEPITOPEpan/. 相似文献

10.

SNP@lincTFBS: An Integrated Database of Polymorphisms in Human LincRNA Transcription Factor Binding Sites

Shangwei Ning Zuxianglan Zhao Jingrun Ye Peng Wang Hui Zhi Ronghong Li Tingting Wang Jianjian Wang Lihua Wang Xia Li 《PloS one》2014,9(7)

相似文献

11.

DsTRD: Danshen Transcriptional Resource Database

Yuxuan Shao Jiabo Wei Fangli Wu Haihua Zhang Dongfeng Yang Zongsuo Liang Weibo Jin 《PloS one》2016,11(2)

相似文献

12.

Genome-Scale Screening of Drug-Target Associations Relevant to Ki Using a Chemogenomics Approach

Dong-Sheng Cao Yi-Zeng Liang Zhe Deng Qian-Nan Hu Min He Qing-Song Xu Guang-Hua Zhou Liu-Xia Zhang Zi-xin Deng Shao Liu 《PloS one》2013,8(4)

The identification of interactions between drugs and target proteins plays a key role in genomic drug discovery. In the present study, the quantitative binding affinities of drug-target pairs are differentiated as a measurement to define whether a drug interacts with a protein or not, and then a chemogenomics framework using an unbiased set of general integrated features and random forest (RF) is employed to construct a predictive model which can accurately classify drug-target pairs. The predictability of the model is further investigated and validated by several independent validation sets. The built model is used to predict drug-target associations, some of which were confirmed by comparing experimental data from public biological resources. A drug-target interaction network with high confidence drug-target pairs was also reconstructed. This network provides further insight for the action of drugs and targets. Finally, a web-based server called PreDPI-K_i was developed to predict drug-target interactions for drug discovery. In addition to providing a high-confidence list of drug-target associations for subsequent experimental investigation guidance, these results also contribute to the understanding of drug-target interactions. We can also see that quantitative information of drug-target associations could greatly promote the development of more accurate models. The PreDPI-K_i server is freely available via: http://sdd.whu.edu.cn/dpiki. 相似文献

13.

PBEAM: A parallel implementation of BEAM for genome-wide inference of epistatic interactions

Tao Peng Pufeng Du Yanda Li 《Bioinformation》2009,3(8):349-351

The software tool PBEAM provides a parallel implementation of the BEAM, which is the first algorithm for large scale epistatic interaction mapping, including genome-wide studies with hundreds of thousands of markers. BEAM describes markers and their interactions with a Bayesian partitioning model and computes the posterior probability of each marker sets via Markov Chain Monte Carlo (MCMC). PBEAM takes the advantage of simulating multiple Markov chains simultaneously. This design can efficiently reduce ~n-fold execution time in the circumstance of n CPUs. The implementation of PBEAM is based on MPI libraries.

Availability

PBEAM is available for download at http://bioinfo.au.tsinghua.edu.cn/pbeam/ 相似文献

14.

ncRNAimprint: A comprehensive database of mammalian imprinted noncoding RNAs

Ying Zhang Dao-Gang Guan Jian-Hua Yang Peng Shao Hui Zhou Liang-Hu Qu 《RNA (New York, N.Y.)》2010,16(10):1889-1901

相似文献

15.

PIGD: a database for intronless genes in the Poaceae

Hanwei Yan Cuiping Jiang Xiaoyu Li Lei Sheng Qing Dong Xiaojian Peng Qian Li Yang Zhao Haiyang Jiang Beijiu Cheng 《BMC genomics》2014,15(1)

相似文献

16.

BGD: A Database of Bat Genomes

Jianfei Fang Xuan Wang Shuo Mu Shuyi Zhang Dong Dong 《PloS one》2015,10(6)

Bats account for ~20% of mammalian species, and are the only mammals with true powered flight. For the sake of their specialized phenotypic traits, many researches have been devoted to examine the evolution of bats. Until now, some whole genome sequences of bats have been assembled and annotated, however, a uniform resource for the annotated bat genomes is still unavailable. To make the extensive data associated with the bat genomes accessible to the general biological communities, we established a Bat Genome Database (BGD). BGD is an open-access, web-available portal that integrates available data of bat genomes and genes. It hosts data from six bat species, including two megabats and four microbats. Users can query the gene annotations using efficient searching engine, and it offers browsable tracks of bat genomes. Furthermore, an easy-to-use phylogenetic analysis tool was also provided to facilitate online phylogeny study of genes. To the best of our knowledge, BGD is the first database of bat genomes. It will extend our understanding of the bat evolution and be advantageous to the bat sequences analysis. BGD is freely available at: http://donglab.ecnu.edu.cn/databases/BatGenome/. 相似文献

17.

mUbiSiDa: A Comprehensive Database for Protein Ubiquitination Sites in Mammals

Tong Chen Tao Zhou Bing He Haiyan Yu Xuejiang Guo Xiaofeng Song Jiahao Sha 《PloS one》2014,9(1)

Motivation

Protein ubiquitination is one of the important post-translational modifications by attaching ubiquitin to specific lysine (K) residues in target proteins, and plays important regulatory roles in many cell processes. Recent studies indicated that abnormal protein ubiquitination have been implicated in many diseases by degradation of many key regulatory proteins including tumor suppressor, oncoprotein, and cell cycle regulator. The detailed information of protein ubiquitination sites is useful for scientists to investigate the mechanism of many cell activities and related diseases.

Results

In this study we established mUbiSida for mammalian Ubiquitination Site Database, which provides a scientific community with a comprehensive, freely and high-quality accessible resource of mammalian protein ubiquitination sites. In mUbiSida, we deposited about 35,494 experimentally validated ubiquitinated proteins with 110,976 ubiquitination sites from five species. The mUbiSiDa can also provide blast function to predict novel protein ubiquitination sites in other species by blast the query sequence in the deposit sequences in mUbiSiDa. The mUbiSiDa was designed to be a widely used tool for biologists and biomedical researchers with a user-friendly interface, and facilitate the further research of protein ubiquitination, biological networks and functional proteomics. The mUbiSiDa database is freely available at http://reprod.njmu.edu.cn/mUbiSiDa. 相似文献

18.

TMREC: A Database of Transcription Factor and MiRNA Regulatory Cascades in Human Diseases

Shuyuan Wang Wei Li Baofeng Lian Xinyi Liu Yan Zhang Enyu Dai Xuexin Yu Fanlin Meng Wei Jiang Xia Li 《PloS one》2015,10(5)

相似文献

19.

DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures

Xu-Cheng Yin Chun Yang Wei-Yi Pei Haixia Man Jun Zhang Erik Learned-Miller Hong Yu 《PloS one》2015,10(5)

Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/. 相似文献

20.

Fast protein structure comparison through effective representation learning with contrastive graph neural networks

Chunqiu Xia Shi-Hao Feng Ying Xia Xiaoyong Pan Hong-Bin Shen 《PLoS computational biology》2022,18(3)

相似文献