首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT’s predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.  相似文献   

2.
A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteins—specifically those known to be involved in molecular interactions—our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer.  相似文献   

3.
4.
5.
6.
7.
Efficient and accurate quantitation of metabolites from LC-MS data has become an important topic. Here we present an automated tool, called iMet-Q (intelligent Metabolomic Quantitation), for label-free metabolomics quantitation from high-throughput MS1 data. By performing peak detection and peak alignment, iMet-Q provides a summary of quantitation results and reports ion abundance at both replicate level and sample level. Furthermore, it gives the charge states and isotope ratios of detected metabolite peaks to facilitate metabolite identification. An in-house standard mixture and a public Arabidopsis metabolome data set were analyzed by iMet-Q. Three public quantitation tools, including XCMS, MetAlign, and MZmine 2, were used for performance comparison. From the mixture data set, seven standard metabolites were detected by the four quantitation tools, for which iMet-Q had a smaller quantitation error of 12% in both profile and centroid data sets. Our tool also correctly determined the charge states of seven standard metabolites. By searching the mass values for those standard metabolites against Human Metabolome Database, we obtained a total of 183 metabolite candidates. With the isotope ratios calculated by iMet-Q, 49% (89 out of 183) metabolite candidates were filtered out. From the public Arabidopsis data set reported with two internal standards and 167 elucidated metabolites, iMet-Q detected all of the peaks corresponding to the internal standards and 167 metabolites. Meanwhile, our tool had small abundance variation (≤0.19) when quantifying the two internal standards and had higher abundance correlation (≥0.92) when quantifying the 167 metabolites. iMet-Q provides user-friendly interfaces and is publicly available for download at http://ms.iis.sinica.edu.tw/comics/Software_iMet-Q.html.  相似文献   

8.
9.
Recent studies have revealed that a small non-coding RNA, microRNA (miRNA) down-regulates its mRNA targets. This effect is regarded as an important role in various biological processes. Many studies have been devoted to predicting miRNA-target interactions. These studies indicate that the interactions may only be functional in some specific tissues, which depend on the characteristics of an miRNA. No systematic methods have been established in the literature to investigate the correlation between miRNA-target interactions and tissue specificity through microarray data. In this study, we propose a method to investigate miRNA-target interaction-supported tissues, which is based on experimentally validated miRNA-target interactions. The tissue specificity results by our method are in accordance with the experimental results in the literature.

Availability and Implementation

Our analysis results are available at http://tsmti.mbc.nctu.edu.tw/ and http://www.stat.nctu.edu.tw/hwang/tsmti.html.  相似文献   

10.
11.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.  相似文献   

12.
As ever larger and more complex biological systems are modeled in silico, approximating physiological lipid bilayers with simple planar models becomes increasingly unrealistic. In order to build accurate large-scale models of subcellular environments, models of lipid membranes with carefully considered, biologically relevant curvature will be essential. In the current work, we present a multi-scale utility called LipidWrapper capable of creating curved membrane models with geometries derived from various sources, both experimental and theoretical. To demonstrate its utility, we use LipidWrapper to examine an important mechanism of influenza virulence. A copy of the program can be downloaded free of charge under the terms of the open-source FreeBSD License from http://nbcr.ucsd.edu/lipidwrapper. LipidWrapper has been tested on all major computer operating systems.
This is a PLOS Computational Biology Software Article
  相似文献   

13.
DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth.  相似文献   

14.
Louisa A. Stark 《Genetics》2015,200(3):679-680
The Genetics Society of America’s Elizabeth W. Jones Award for Excellence in Education recognizes significant and sustained impact on genetics education. The 2015 awardee, Louisa Stark, has made a major impact on global access to genetics education through her work as director of the University of Utah Genetic Science Learning Center. The Center’s Learn.Genetics and Teach.Genetics websites are the most widely used online genetic education resources in the world. In 2014, they were visited by 18 million students, educators, scientists, and members of the public. With over 60 million page views annually, Learn.Genetics is among the most used sites on the Web.Open in a separate window  相似文献   

15.
Identification of key metabolites for complex diseases is a challenging task in today''s medicine and biology. A special disease is usually caused by the alteration of a series of functional related metabolites having a global influence on the metabolic network. Moreover, the metabolites in the same metabolic pathway are often associated with the same or similar disease. Based on these functional relationships between metabolites in the context of metabolic pathways, we here presented a pathway-based random walk method called PROFANCY for prioritization of candidate disease metabolites. Our strategy not only takes advantage of the global functional relationships between metabolites but also sufficiently exploits the functionally modular nature of metabolic networks. Our approach proved successful in prioritizing known metabolites for 71 diseases with an AUC value of 0.895. We also assessed the performance of PROFANCY on 16 disease classes and found that 4 classes achieved an AUC value over 0.95. To investigate the robustness of the PROFANCY, we repeated all the analyses in two metabolic networks and obtained similar results. Then we applied our approach to Alzheimer''s disease (AD) and found that a top ranked candidate was potentially related to AD but had not been reported previously. Furthermore, our method was applicable to prioritize the metabolites from metabolomic profiles of prostate cancer. The PROFANCY could identify prostate cancer related-metabolites that are supported by literatures but not considered to be significantly differential by traditional differential analysis. We also developed a freely accessible web-based and R-based tool at http://bioinfo.hrbmu.edu.cn/PROFANCY.  相似文献   

16.
Aggregatibacter actinomycetemcomitans is a major etiological agent of periodontitis. Here we report the complete genome sequence of serotype c strain D11S-1, which was recovered from the subgingival plaque of a patient diagnosed with generalized aggressive periodontitis.Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontal disease, in particular aggressive periodontitis (12). The natural population of A. actinomycetemcomitans is clonal (7). Six A. actinomycetemcomitans serotypes are distinguished based on the structural and serological characteristics of the O antigen of LPS (6, 7). Three of the serotypes (a, b, and c) comprise >80% of all strains, and each serotype represents a distinct clonal lineage (1, 6, 7). Serotype c strain D11S-1 was cultured from a subgingival plaque sample of a patient diagnosed with generalized aggressive periodontitis. The complete genome sequencing of the strain was determined by 454 pyrosequencing (10), which achieved 25× coverage. Assembly was performed using the Newbler assembler (454, Branford, CT) and generated 199 large contigs, with 99.3% of the bases having a quality score of 40 and above. The contigs were aligned with the genome of the sequenced serotype b strain HK1651 (http://www.genome.ou.edu/act.html) using software written in house. The putative contig gaps were then closed by primer walking and sequencing of PCR products over the gaps. The final genome assembly was further confirmed by comparison of an in silico NcoI restriction map to the experimental map generated by optical mapping (8). The genome structure of the D11S-1 strain was compared to that of the sequenced strain HK1651 using the program MAUVE (2, 3). The automated annotation was done using a protocol similar to the annotation engine service at The Institute for Genomic Research/J. Craig Venter Institute with some local modifications. Briefly, protein-coding genes were identified using Glimmer3 (4). Each protein sequence was then annotated by comparing to the GenBank nonredundant protein database. BLAST-Extend-Repraze was applied to the predicted genes to identify genes that might have been truncated due to a frameshift mutation or premature stop codon. tRNA and rRNA genes were identified by using tRNAScan-SE (9) and a similarity search to our in-house RNA database, respectively.The D11S-1 circular genome contains 2,105,764 nucleotides, a GC content of 44.55%, 2,134 predicted coding sequences, and 54 tRNA and 19 rRNA genes (see additional data at http://expression.washington.edu/bumgarnerlab/publications.php). The distribution of predicted genes based on functional categories was similar between D11S-1 and HK1651 (http://expression.washington.edu/bumgarnerlab/publications.php). One hundred six and 86 coding sequences were unique to strain D11S-1 and HK1651, respectively (http://expression.washington.edu/bumgarnerlab/publications.php). Genomic islands were identified based on annotations for strain HK1651 and based on manual inspection of contiguous D11S-1 specific DNA regions with G+C bias (http://expression.washington.edu/bumgarnerlab/publications.php). Among 12 identified genomics islands, 5 (B, C, D, E and G; cytolethal distending toxin gene cluster, tight adherence gene cluster, O-antigen biosynthesis and transport gene cluster, leukotoxin gene cluster, and lipoligosaccharide biosynthesis enzyme gene, respectively) correspond to islands 2 to 5 and 8 of strain HK1651 (http://www.oralgen.lanl.gov/) (5). Island F (∼5 kb) is homologous to a portion of the 12.5-kb island 7 in HK1651. Five genomic islands (H to L) were unique to strain D11S-1. The remaining island (A) is a fusion of genomic islands 1 and 6, in strain HK1651. The genome of D11S-1 is largely in synteny with the genome of the sequenced serotype b strain HK1651 but contained several large-scale genomic rearrangements.Strain D11S-1 harbors a 43-kb bacteriophage and two plasmids of 31 and 23 kb (http://expression.washington.edu/bumgarnerlab/publications.php). Excluding an ∼9-kb region of low homology, the phage showed >90% nucleotide sequence identity with AaΦ23 (11). A 49-bp attB site (11) was identified at coordinates 2,024,825 to 2,024,873. The location of the inserted phage was identified in the optical map of strain D11S-1 and further confirmed by PCR amplification and sequencing of the regions flanking the insertion site. A closed circular form of the phage was also detected in strain D11S-1 by PCR analysis of the phage ends. The 23-kb plasmid is homologous to pVT745 (92% nucleotide identities). The 31-kb plasmid is a novel plasmid. It has significant homologies in short regions (<2 kb) to Haemophilus influenzae biotype aegyptius plasmid pF1947 and other plasmids.  相似文献   

17.
18.
Tumor samples are typically heterogeneous, containing admixture by normal, non-cancerous cells and one or more subpopulations of cancerous cells. Whole-genome sequencing of a tumor sample yields reads from this mixture, but does not directly reveal the cell of origin for each read. We introduce THetA (Tumor Heterogeneity Analysis), an algorithm that infers the most likely collection of genomes and their proportions in a sample, for the case where copy number aberrations distinguish subpopulations. THetA successfully estimates normal admixture and recovers clonal and subclonal copy number aberrations in real and simulated sequencing data. THetA is available at http://compbio.cs.brown.edu/software/.  相似文献   

19.
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/.  相似文献   

20.

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号