首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SUMMARY: The FAF-Drugs2 server is a web application that prepares chemical compound libraries prior to virtual screening or that assists hit selection/lead optimization before chemical synthesis or ordering. The FAF-Drugs2 web server is an enhanced version of the FAF-Drugs2 package that now includes Pan Assay Interference Compounds detection. This online toolkit has been designed through a user-centered approach with emphasis on user-friendliness. This is a unique online tool allowing to prepare large compound libraries with in house or user-defined filtering parameters. AVAILABILITY: The FAF-Drugs2 server is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/FAF-Drugs/.  相似文献   

2.
3.
A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr.  相似文献   

4.
Sequence simulators play an important role in phylogenetics. Simulated data has many applications, such as evaluating the performance of different methods, hypothesis testing with parametric bootstraps, and, more recently, generating data for training machine-learning applications. Many sequence simulation programmes exist, but the most feature-rich programmes tend to be rather slow, and the fastest programmes tend to be feature-poor. Here, we introduce AliSim, a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 h and 1.3 GB RAM to simulate alignments with one million sequences or sites, whereas popular software Seq-Gen, Dawg, and INDELible require 2–5 h and 50–500 GB of RAM. We provide AliSim as an extension of the IQ-TREE software version 2.2, freely available at www.iqtree.org, and a comprehensive user tutorial at http://www.iqtree.org/doc/AliSim.  相似文献   

5.
Selenoproteins are proteins containing an uncommon amino acid selenocysteine (Sec). Sec is inserted by a specific translational machinery that recognizes a stem-loop structure, the SECIS element, at the 3′ UTR of selenoprotein genes and recodes a UGA codon within the coding sequence. As UGA is normally a translational stop signal, selenoproteins are generally misannotated and designated tools have to be developed for this class of proteins. Here, we present two new computational methods for selenoprotein identification and analysis, which we provide publicly through the web servers at http://gladyshevlab.org/SelenoproteinPredictionServer or http://seblastian.crg.es. SECISearch3 replaces its predecessor SECISearch as a tool for prediction of eukaryotic SECIS elements. Seblastian is a new method for selenoprotein gene detection that uses SECISearch3 and then predicts selenoprotein sequences encoded upstream of SECIS elements. Seblastian is able to both identify known selenoproteins and predict new selenoproteins. By applying these tools to diverse eukaryotic genomes, we provide a ranked list of newly predicted selenoproteins together with their annotated cysteine-containing homologues. An analysis of a representative candidate belonging to the AhpC family shows how the use of Sec in this protein evolved in bacterial and eukaryotic lineages.  相似文献   

6.
7.
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.  相似文献   

8.
A database providing information on mosquito specimens (Arthropoda: Diptera: Culicidae) collected in French Guiana is presented. Field collections were initiated in 2013 under the auspices of the CEnter for the study of Biodiversity in Amazonia (CEBA: http://www.labexceba.fr/en/). This study is part of an ongoing process aiming to understand the distribution of mosquitoes, including vector species, across French Guiana. Occurrences are recorded after each collecting trip in a database managed by the laboratory Evolution et Diversité Biologique (EDB), Toulouse, France. The dataset is updated monthly and is available online. Voucher specimens and their associated DNA are stored at the laboratory Ecologie des Forêts de Guyane (Ecofog), Kourou, French Guiana. The latest version of the dataset is accessible through EDB’s Integrated Publication Toolkit at http://130.120.204.55:8080/ipt/resource.do?r=mosquitoes_of_french_guiana or through the Global Biodiversity Information Facility data portal at http://www.gbif.org/dataset/5a8aa2ad-261c-4f61-a98e-26dd752fe1c5 It can also be viewed through the Guyanensis platform at http://guyanensis.ups-tlse.fr  相似文献   

9.
Designed peptides that bind to major histocompatibility protein I (MHC-I) allomorphs bear the promise of representing epitopes that stimulate a desired immune response. A rigorous bioinformatical exploration of sequence patterns hidden in peptides that bind to the mouse MHC-I allomorph H-2Kb is presented. We exemplify and validate these motif findings by systematically dissecting the epitope SIINFEKL and analyzing the resulting fragments for their binding potential to H-2Kb in a thermal denaturation assay. The results demonstrate that only fragments exclusively retaining the carboxy- or amino-terminus of the reference peptide exhibit significant binding potential, with the N-terminal pentapeptide SIINF as shortest ligand. This study demonstrates that sophisticated machine-learning algorithms excel at extracting fine-grained patterns from peptide sequence data and predicting MHC-I binding peptides, thereby considerably extending existing linear prediction models and providing a fresh view on the computer-based molecular design of future synthetic vaccines. The server for prediction is available at http://modlab-cadd.ethz.ch (SLiDER tool, MHC-I version 2012).  相似文献   

10.
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/∼bjyoon/picxaa/.  相似文献   

11.
Detection of remote sequence homology is essential for the accurate inference of protein structure, function and evolution. The most sensitive detection methods involve the comparison of evolutionary patterns reflected in multiple sequence alignments (MSAs) of protein families. We present PROCAIN, a new method for MSA comparison based on the combination of ‘vertical’ MSA context (substitution constraints at individual sequence positions) and ‘horizontal’ context (patterns of residue content at multiple positions). Based on a simple and tractable profile methodology and primitive measures for the similarity of horizontal MSA patterns, the method achieves the quality of homology detection comparable to a more complex advanced method employing hidden Markov models (HMMs) and secondary structure (SS) prediction. Adding SS information further improves PROCAIN performance beyond the capabilities of current state-of-the-art tools. The potential value of the method for structure/function predictions is illustrated by the detection of subtle homology between evolutionary distant yet structurally similar protein domains. ProCAIn, relevant databases and tools can be downloaded from: http://prodata.swmed.edu/procain/download. The web server can be accessed at http://prodata.swmed.edu/procain/procain.php.  相似文献   

12.
Whole-genome sequencing of Mauritian cynomolgus macaques reveals novel candidate loci for controlling simian immunodeficiency virus replication.See related Research, http://genomebiology.com/2014/15/11/478  相似文献   

13.
Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of on the benchmark dataset SPX, which corresponds to improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.  相似文献   

14.
Despite the importance of clathrin-mediated endocytosis (CME) for cell biology, it is unclear if all components of the machinery have been discovered and many regulatory aspects remain poorly understood. Here, using Saccharomyces cerevisiae and a fluorescence microscopy screening approach we identify previously unknown regulatory factors of the endocytic machinery. We further studied the top scoring protein identified in the screen, Ubx3, a member of the conserved ubiquitin regulatory X (UBX) protein family. In vivo and in vitro approaches demonstrate that Ubx3 is a new coat component. Ubx3-GFP has typical endocytic coat protein dynamics with a patch lifetime of 45 ± 3 sec. Ubx3 contains a W-box that mediates physical interaction with clathrin and Ubx3-GFP patch lifetime depends on clathrin. Deletion of the UBX3 gene caused defects in the uptake of Lucifer Yellow and the methionine transporter Mup1 demonstrating that Ubx3 is needed for efficient endocytosis. Further, the UBX domain is required both for localization and function of Ubx3 at endocytic sites. Mechanistically, Ubx3 regulates dynamics and patch lifetime of the early arriving protein Ede1 but not later arriving coat proteins or actin assembly. Conversely, Ede1 regulates the patch lifetime of Ubx3. Ubx3 likely regulates CME via the AAA-ATPase Cdc48, a ubiquitin-editing complex. Our results uncovered new components of the CME machinery that regulate this fundamental process.  相似文献   

15.
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/.  相似文献   

16.
Multiple-trait association mapping, in which multiple traits are used simultaneously in the identification of genetic variants affecting those traits, has recently attracted interest. One class of approaches for this problem builds on classical variance component methodology, utilizing a multitrait version of a linear mixed model. These approaches both increase power and provide insights into the genetic architecture of multiple traits. In particular, it is possible to estimate the genetic correlation, which is a measure of the portion of the total correlation between traits that is due to additive genetic effects. Unfortunately, the practical utility of these methods is limited since they are computationally intractable for large sample sizes. In this article, we introduce a reformulation of the multiple-trait association mapping approach by defining the matrix-variate linear mixed model. Our approach reduces the computational time necessary to perform maximum-likelihood inference in a multiple-trait model by utilizing a data transformation. By utilizing a well-studied human cohort, we show that our approach provides more than a 10-fold speedup, making multiple-trait association feasible in a large population cohort on the genome-wide scale. We take advantage of the efficiency of our approach to analyze gene expression data. By decomposing gene coexpression into a genetic and environmental component, we show that our method provides fundamental insights into the nature of coexpressed genes. An implementation of this method is available at http://genetics.cs.ucla.edu/mvLMM.  相似文献   

17.
Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP)] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.  相似文献   

18.
The protein databank (PDB) contains high quality structural data for computational structural biology investigations. We have earlier described a fast tool (the decomp_pdb tool) for identifying and marking missing atoms and residues in PDB files. The tool also automatically decomposes PDB entries into separate files describing ligands and polypeptide chains. Here, we describe a web interface named DECOMP for the tool. Our program correctly identifies multi­monomer ligands, and the server also offers the preprocessed ligand­protein decomposition of the complete PDB for downloading (up to size: 5GB)

Availability

http://decomp.pitgroup.org  相似文献   

19.
The kinetochore (centromeric DNA and associated protein complex) is essential for faithful chromosome segregation and maintenance of genome stability. Here we report that an evolutionarily conserved protein Pat1 is a structural component of Saccharomyces cerevisiae kinetochore and associates with centromeres in a NDC10-dependent manner. Consistent with a role for Pat1 in kinetochore structure and function, a deletion of PAT1 results in delay in sister chromatid separation, errors in chromosome segregation, and defects in structural integrity of centromeric chromatin. Pat1 is involved in topological regulation of minichromosomes as altered patterns of DNA supercoiling were observed in pat1Δ cells. Studies with pat1 alleles uncovered an evolutionarily conserved region within the central domain of Pat1 that is required for its association with centromeres, sister chromatid separation, and faithful chromosome segregation. Taken together, our data have uncovered a novel role for Pat1 in maintaining the structural integrity of centromeric chromatin to facilitate faithful chromosome segregation and proper kinetochore function.  相似文献   

20.
We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号