首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong  相似文献   

2.
Motivation: Reliable structural modelling of protein–proteincomplexes has widespread application, from drug design to advancingour knowledge of protein interactions and function. This workaddresses three important issues in protein–protein docking:implementing backbone flexibility, incorporating prior indicationsfrom experiment and bioinformatics, and providing public accessvia a server. 3D-Garden (Global And Restrained Docking ExplorationNexus), our benchmarked and server-ready flexible docking system,allows sophisticated programming of surface patches by the uservia a facet representation of the interactors’ molecularsurfaces (generated with the marching cubes algorithm). Flexibilityis implemented as a weighted exhaustive conformer search foreach clashing pair of molecular branches in a set of 5000 modelsfiltered from around 340 000 initially. Results: In a non-global assessment, carried out strictly accordingto the protocols for number of models considered and model qualityof the Critical Assessment of Protein Interactions (CAPRI) experiment,over the widely-used Benchmark 2.0 of 84 complexes, 3D-Gardenidentifies a set of ten models containing an acceptable or bettermodel in 29/45 test cases, including one with large conformationalchange. In 19/45 cases an acceptable or better model is rankedfirst or second out of 340 000 candidates. Availability: http://www.sbg.bio.ic.ac.uk/3dgarden (server) Contact: v.lesk{at}ic.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Burkhard Rost  相似文献   

3.
Motivation: Recent improvements in high-throughput Mass Spectrometry(MS) technology have expedited genome-wide discovery of protein–proteininteractions by providing a capability of detecting proteincomplexes in a physiological setting. Computational inferenceof protein interaction networks and protein complexes from MSdata are challenging. Advances are required in developing robustand seamlessly integrated procedures for assessment of protein–proteininteraction affinities, mathematical representation of proteininteraction networks, discovery of protein complexes and evaluationof their biological relevance. Results: A multi-step but easy-to-follow framework for identifyingprotein complexes from MS pull-down data is introduced. It assessesinteraction affinity between two proteins based on similarityof their co-purification patterns derived from MS data. It constructsa protein interaction network by adopting a knowledge-guidedthreshold selection method. Based on the network, it identifiesprotein complexes and infers their core components using a graph-theoreticalapproach. It deploys a statistical evaluation procedure to assessbiological relevance of each found complex. On Saccharomycescerevisiae pull-down data, the framework outperformed othermore complicated schemes by at least 10% in F1-measure and identified610 protein complexes with high-functional homogeneity basedon the enrichment in Gene Ontology (GO) annotation. Manual examinationof the complexes brought forward the hypotheses on cause offalse identifications. Namely, co-purification of differentprotein complexes as mediated by a common non-protein molecule,such as DNA, might be a source of false positives. Protein identificationbias in pull-down technology, such as the hydrophilic bias couldresult in false negatives. Contact: samatovan{at}ornl.gov Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Jonathan Wren Present address: Department of Biomedical Informatics, VanderbiltUniversity, Nashville, TN 37232. The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors.  相似文献   

4.
Summary: We developed an interactive gene ontology (GO) browsernamed GOTreePlus that superimposes annotation information overGO structures. It can facilitate the identification of importantGO terms through interactive visualization of them in the GOstructure. The interactive pie chart summarizing an annotationdistribution for a selected GO term provides users with a succinctcontext-sensitive overview of their experimental results. Wetested our GOTreePlus using a proteome profiling dataset obtainedon differentiation of retinal pigment epithelial cells where399 proteins were quantified. Availability: http://bioinformatics.cnmcresearch.org/GOTreePlus/ Contact: jseo{at}cnmcresearch.org Associate Editor: John Quackenbush  相似文献   

5.
Motivation: The quest for high-throughput proteomics has revealeda number of challenges in recent years. Whilst substantial improvementsin automated protein separation with liquid chromatography andmass spectrometry (LC/MS), aka ‘shotgun’ proteomics,have been achieved, large-scale open initiatives such as theHuman Proteome Organization (HUPO) Brain Proteome Project haveshown that maximal proteome coverage is only possible when LC/MSis complemented by 2D gel electrophoresis (2-DE) studies. Moreover,both separation methods require automated alignment and differentialanalysis to relieve the bioinformatics bottleneck and so makehigh-throughput protein biomarker discovery a reality. The purposeof this article is to describe a fully automatic image alignmentframework for the integration of 2-DE into a high-throughputdifferential expression proteomics pipeline. Results: The proposed method is based on robust automated imagenormalization (RAIN) to circumvent the drawbacks of traditionalapproaches. These use symbolic representation at the very earlystages of the analysis, which introduces persistent errors dueto inaccuracies in modelling and alignment. In RAIN, a third-ordervolume-invariant B-spline model is incorporated into a multi-resolutionschema to correct for geometric and expression inhomogeneityat multiple scales. The normalized images can then be compareddirectly in the image domain for quantitative differential analysis.Through evaluation against an existing state-of-the-art methodon real and synthetically warped 2D gels, the proposed analysisframework demonstrates substantial improvements in matchingaccuracy and differential sensitivity. High-throughput analysisis established through an accelerated GPGPU (general purposecomputation on graphics cards) implementation. Availability: Supplementary material, software and images usedin the validation are available at http://www.proteomegrid.org/rain/ Contact: g.z.yang{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: David Rocke  相似文献   

6.
7.
Motivation: High-throughput experimental and computational methodsare generating a wealth of protein–protein interactiondata for a variety of organisms. However, data produced by currentstate-of-the-art methods include many false positives, whichcan hinder the analyses needed to derive biological insights.One way to address this problem is to assign confidence scoresthat reflect the reliability and biological significance ofeach interaction. Most previously described scoring methodsuse a set of likely true positives to train a model to scoreall interactions in a dataset. A single positive training set,however, may be biased and not representative of true interactionspace. Results: We demonstrate a method to score protein interactionsby utilizing multiple independent sets of training positivesto reduce the potential bias inherent in using a single trainingset. We used a set of benchmark yeast protein interactions toshow that our approach outperforms other scoring methods. Ourapproach can also score interactions across data types, whichmakes it more widely applicable than many previously proposedmethods. We applied the method to protein interaction data fromboth Drosophila melanogaster and Homo sapiens. Independent evaluationsshow that the resulting confidence scores accurately reflectthe biological significance of the interactions. Contact: rfinley{at}wayne.edu Supplementary information: Supplementary data are availableat Bioinformatics Online. Associate Editor: Burkhard Rost  相似文献   

8.
Motivation: High-density DNA microarrays provide us with usefultools for analyzing DNA and RNA comprehensively. However, thebackground signal caused by the non-specific binding (NSB) betweenprobe and target makes it difficult to obtain accurate measurements.To remove the background signal, there is a set of backgroundprobes on Affymetrix Exon arrays to represent the amount ofnon-specific signals, and an accurate estimation of non-specificsignals using these background probes is desirable for improvementof microarray analyses. Results: We developed a thermodynamic model of NSB on shortnucleotide microarrays in which the NSBs are modeled by duplexformation of probes and multiple hypothetical targets. We fittedthe observed signal intensities of the background probes withthose expected by the model to obtain the model parameters.As a result, we found that the presented model can improve theaccuracy of prediction of non-specific signals in comparisonwith previously proposed methods. This result will provide auseful method to correct for the background signal in oligonucleotidemicroarray analysis. Availability: The software is implemented in the R languageand can be downloaded from our website (http://www-shimizu.ist.osaka-u.ac.jp/shimizu_lab/MSNS/). Contact: furusawa{at}ist.osaka-u.ac.jp Supplementary information: Supplementary data are availableat Bioinformatics online. The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. Associate Editor: Trey Ideker  相似文献   

9.
Summary: Cross-mapping of gene and protein identifiers betweendifferent databases is a tedious and time-consuming task. Toovercome this, we developed CRONOS, a cross-reference serverthat contains entries from five mammalian organisms presentedby major gene and protein information resources. Sequence similarityanalysis of the mapped entries shows that the cross-referencesare highly accurate. In total, up to 18 different identifiertypes can be used for identification of cross-references. Thequality of the mapping could be improved substantially by exclusionof ambiguous gene and protein names which were manually validated.Organism-specific lists of ambiguous terms, which are valuablefor a variety of bioinformatics applications like text miningare available for download. Availability: CRONOS is freely available to non-commercial usersat http://mips.gsf.de/genre/proj/cronos/index.html, web servicesare available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl. Contact: brigitte.waegele{at}helmholtz-muenchen.de Supplementary information: Supplementary data are availableat Bioinformatics online. The online Supplementary Materialcontains all figures and tables referenced by this article. Associate Editor: Martin Bishop  相似文献   

10.
The ability to rank proteins by their likely success in crystallizationis useful in current Structural Biology efforts and in particularin high-throughput Structural Genomics initiatives. We presentParCrys, a Parzen Window approach to estimate a protein's propensityto produce diffraction-quality crystals. The Protein Data Bank(PDB) provided training data whilst the databases TargetDB andPepcDB were used to define feature selection data as well astest data independent of feature selection and training. ParCrysoutperforms the OB-Score, SECRET and CRYSTALP on the data examined,with accuracy and Matthews correlation coefficient values of79.1% and 0.582, respectively (74.0% and 0.227, respectively,on data with a ‘real-world’ ratio of positive:negativeexamples). ParCrys predictions and associated data are availablefrom www.compbio.dundee.ac.uk/parcrys. Contact: geoff{at}compbio.dundee.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: John Quackenbush  相似文献   

11.
A multivariate test of association   总被引:1,自引:0,他引:1  
Summary: Although genetic association studies often test multiple,related phenotypes, few formal multivariate tests of associationare available. We describe a test of association that can beefficiently applied to large population-based designs. Availability: A C++ implementation can be obtained from theauthors. Contact: manuel.ferreira{at}qimr.edu.au Supplementary information: Supplementary figures are availableat Bioinformatics online. Associate Editor: Alex Bateman  相似文献   

12.
Motivation: The genomic methylation analysis is useful to typebacteria that have a high number of expressed type II methyltransferases.Methyltransferases are usually committed to Restriction andModification (R-M) systems, in which the restriction endonucleaseimposes high pressure on the expression of the cognate methyltransferasethat hinder R-M system loss. Conventional cluster methods donot reflect this tendency. An algorithm was developed for dendrogramconstruction reflecting the propensity for conservation of R-MType II systems. Results: The new algorithm was applied to 52 Helicobacter pyloristrains from different geographical regions and compared withconventional clustering methods. The algorithm works by firstgrouping strains that share a common minimum set of R-M systemsand gradually adds strains according to the number of the R-Msystems acquired. Dendrograms revealed a cluster of Africanstrains, which suggest that R-M systems are present in H.pylorigenome since its human host migrates from Africa. Availability: The software files are available at http://www.ff.ul.pt/paginas/jvitor/Bioinformatics/MCRM_algorithm.zip Contact: filipavale{at}fe.ucp.pt Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

13.
14.
MMG: a probabilistic tool to identify submodules of metabolic pathways   总被引:1,自引:0,他引:1  
Motivation: A fundamental task in systems biology is the identificationof groups of genes that are involved in the cellular responseto particular signals. At its simplest level, this often reducesto identifying biological quantities (mRNA abundance, enzymeconcentrations, etc.) which are differentially expressed intwo different conditions. Popular approaches involve using t-teststatistics, based on modelling the data as arising from a mixturedistribution. A common assumption of these approaches is thatthe data are independent and identically distributed; however,biological quantities are usually related through a complex(weighted) network of interactions, and often the more pertinentquestion is which subnetworks are differentially expressed,rather than which genes. Furthermore, in many interesting cases(such as high-throughput proteomics and metabolomics), onlyvery partial observations are available, resulting in the needfor efficient imputation techniques. Results: We introduce Mixture Model on Graphs (MMG), a novelprobabilistic model to identify differentially expressed submodulesof biological networks and pathways. The method can easily incorporateinformation about weights in the network, is robust againstmissing data and can be easily generalized to directed networks.We propose an efficient sampling strategy to infer posteriorprobabilities of differential expression, as well as posteriorprobabilities over the model parameters. We assess our methodon artificial data demonstrating significant improvements overstandard mixture model clustering. Analysis of our model resultson quantitative high-throughput proteomic data leads to theidentification of biologically significant subnetworks, as wellas the prediction of the expression level of a number of enzymes,some of which are then verified experimentally. Availability: MATLAB code is available from http://www.dcs.shef.ac.uk/~guido/software.html Contact: guido{at}dcs.shef.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Jonathan Wren  相似文献   

15.
Model-based deconvolution of genome-wide DNA binding   总被引:1,自引:0,他引:1  
Motivation: Chromatin immunoprecipitation followed by hybridizationto a genomic tiling microarray (ChIP-chip) is a routinely usedprotocol for localizing the genomic targets of DNA-binding proteins.The resolution to which binding sites in this assay can be identifiedis commonly considered to be limited by two factors: (1) theresolution at which the genomic targets are tiled in the microarrayand (2) the large and variable lengths of the immunoprecipitatedDNA fragments. Results: We have developed a generative model of binding sitesin ChIP-chip data and an approach, MeDiChI, for efficientlyand robustly learning that model from diverse data sets. Wehave evaluated MeDiChI's performance using simulated data, aswell as on several diverse ChIP-chip data sets collected onwidely different tiling array platforms for two different organisms(Saccharomyces cerevisiae and Halobacterium salinarium NRC-1).We find that MeDiChI accurately predicts binding locations toa resolution greater than that of the probe spacing, even foroverlapping peaks, and can increase the effective resolutionof tiling array data by a factor of 5x or better. Moreover,the method's performance on simulated data provides insightsinto effectively optimizing the experimental design for increasedbinding site localization accuracy and efficacy. Availability: MeDiChI is available as an open-source R package,including all data, from http://baliga.systemsbiology.net/medichi. Contact: dreiss{at}systemsbiology.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

16.
GENOME: a rapid coalescent-based whole genome simulator   总被引:1,自引:0,他引:1  
Summary: GENOME proposes a rapid coalescent-based approach tosimulate whole genome data. In addition to features of standardcoalescent simulators, the program allows for recombinationrates to vary along the genome and for flexible population histories.Within small regions, we have evaluated samples simulated byGENOME to verify that GENOME provides the expected LD patternsand frequency spectra. The program can be used to study thesampling properties of any statistic for a whole genome study. Availability: The program and C++ source code are availableonline at http://www.sph.umich.edu/csg/liang/genome/ Contact: lianglim{at}umich.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

17.
Motivation: In recent years, several methods have been proposedfor determining metabolic pathways in an automated way basedon network topology. The aim of this work is to analyse thesemethods by tackling a concrete example relevant in biochemistry.It concerns the question whether even-chain fatty acids, beingthe most important constituents of lipids, can be convertedinto sugars at steady state. It was proved five decades agothat this conversion using the Krebs cycle is impossible unlessthe enzymes of the glyoxylate shunt (or alternative bypasses)are present in the system. Using this example, we can comparethe various methods in pathway analysis. Results: Elementary modes analysis (EMA) of a set of enzymescorresponding to the Krebs cycle, glycolysis and gluconeogenesissupports the scientific evidence showing that there is no pathwaycapable of converting acetyl-CoA to glucose at steady state.This conversion is possible after the addition of isocitratelyase and malate synthase (forming the glyoxylate shunt) tothe system. Dealing with the same example, we compare EMA withtwo tools based on graph theory available online, PathFindingand Pathway Hunter Tool. These automated network generatingtools do not succeed in predicting the conversions known fromexperiment. They sometimes generate unbalanced paths and revealproblems identifying side metabolites that are not responsiblefor the carbon net flux. This shows that, for metabolic pathwayanalysis, it is important to consider the topology (includingbimolecular reactions) and stoichiometry of metabolic systems,as is done in EMA. Contact: ldpf{at}minet.uni-jena.de; schuster{at}minet.uni-jena.de Supplementary information: Supplementary data are availableat Bioinformatics online. FOOTNOTES Associate Editor: Alfonso Valencia Received on July 24, 2008; revised on September 18, 2008; accepted on September 18, 2008  相似文献   

18.
Summary: Malaria, one of the world's most common diseases, iscaused by the intracellular protozoan parasite known as Plasmodium.Recently, with the arrival of several malaria parasite genomes,we established an integrated system named PlasmoGF for comparativegenomics and phylogenetic analysis of Plasmodium gene families.Gene families were clustered using the Markov Cluster algorithmimplemented in TribeMCL program and could be searched usingkeywords, gene-family information, domain composition, GeneOntology and BLAST. Moreover, a number of useful bioinformaticstools were implemented to facilitate the analysis of these putativePlasmodium gene families, including gene retrieval, annotation,sequence alignment, phylogeny construction and visualization.In the current version, PlasmoGF contained 8980 sets of genefamilies derived from six malaria parasite genomes: Plasmodium.falciparum, P. berghei, P. knowlesi, P. chabaudi, P. vivax andP. yoelii. The availability of such a highly integrated systemwould be of great interest for the community of researchersworking on malaria parasite phylogenomics. Availability: PlasmoGF is freely available at http://bioinformatics.zj.cn/pgf/ Contact: xiaokunli{at}163.net; baoqy{at}genomics.org.cn; fuz3{at}psu.edu Associate Editor: Jonathan Wren The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors.  相似文献   

19.
20.
Summary: FAMHAP is an established software for haplotype associationanalysis of nuclear families. We have released a major updatethat comprises various new features for case-control data. Furthermore,weprovide an additional program runFamhap that allows usersto start the same method repeatedly for varying sets of geneticmarkers. In addition, a platform-independent graphical userinterface (GUI) was developed to simplify the usage of bothFAMHAP and runFamhap. The runFamhap program greatly facilitatesthe application of FAMHAP to genome-wide association studies(GWAS) and supports flexible genome-wide haplotype analysis.As an example, we describe application to HapMap data. Availability: The software is available at http://famhap.meb.uni-bonn.de Contact: herold{at}imbie.meb.uni-bonn.de; becker{at}imbie.meb.uni-bonn.de Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Alex Bateman  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号