首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Motivation: The genomic methylation analysis is useful to typebacteria that have a high number of expressed type II methyltransferases.Methyltransferases are usually committed to Restriction andModification (R-M) systems, in which the restriction endonucleaseimposes high pressure on the expression of the cognate methyltransferasethat hinder R-M system loss. Conventional cluster methods donot reflect this tendency. An algorithm was developed for dendrogramconstruction reflecting the propensity for conservation of R-MType II systems. Results: The new algorithm was applied to 52 Helicobacter pyloristrains from different geographical regions and compared withconventional clustering methods. The algorithm works by firstgrouping strains that share a common minimum set of R-M systemsand gradually adds strains according to the number of the R-Msystems acquired. Dendrograms revealed a cluster of Africanstrains, which suggest that R-M systems are present in H.pylorigenome since its human host migrates from Africa. Availability: The software files are available at http://www.ff.ul.pt/paginas/jvitor/Bioinformatics/MCRM_algorithm.zip Contact: filipavale{at}fe.ucp.pt Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

2.
Many genome-wide assays involve the generation of a subset (or representation) of the genome following restriction enzyme digestion. The use of enzymes sensitive to cytosine methylation allows high-throughput analysis of this epigenetic regulatory process. We show that the use of a dual-adapter approach allows us to generate genomic representations that includes fragments of <200bp in size, previously not possible when using the standard approach of using a single adapter. By expanding the representation to smaller fragments using HpaII or MspI, we increase the representation by these isoschizomers to more than 1.32 million loci in the human genome, representing 98.5% of CpG islands and 91.1% of refSeq promoters. This advance allows the development of a new, high-resolution version of our HpaII-tiny fragment Enrichment by Ligation-mediated PCR (HELP) assay to study cytosine methylation. We also show that the MspI representation generates information about copy-number variation, that the assay can be used on as little as 10ng of DNA and that massively parallel sequencing can be used as an alternative to microarrays to read the output of the assay, making this a powerful discovery platform for studies of genomic and epigenomic abnormalities.  相似文献   

3.
GENOME: a rapid coalescent-based whole genome simulator   总被引:1,自引:0,他引:1  
Summary: GENOME proposes a rapid coalescent-based approach tosimulate whole genome data. In addition to features of standardcoalescent simulators, the program allows for recombinationrates to vary along the genome and for flexible population histories.Within small regions, we have evaluated samples simulated byGENOME to verify that GENOME provides the expected LD patternsand frequency spectra. The program can be used to study thesampling properties of any statistic for a whole genome study. Availability: The program and C++ source code are availableonline at http://www.sph.umich.edu/csg/liang/genome/ Contact: lianglim{at}umich.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

4.
Model-based deconvolution of genome-wide DNA binding   总被引:1,自引:0,他引:1  
Motivation: Chromatin immunoprecipitation followed by hybridizationto a genomic tiling microarray (ChIP-chip) is a routinely usedprotocol for localizing the genomic targets of DNA-binding proteins.The resolution to which binding sites in this assay can be identifiedis commonly considered to be limited by two factors: (1) theresolution at which the genomic targets are tiled in the microarrayand (2) the large and variable lengths of the immunoprecipitatedDNA fragments. Results: We have developed a generative model of binding sitesin ChIP-chip data and an approach, MeDiChI, for efficientlyand robustly learning that model from diverse data sets. Wehave evaluated MeDiChI's performance using simulated data, aswell as on several diverse ChIP-chip data sets collected onwidely different tiling array platforms for two different organisms(Saccharomyces cerevisiae and Halobacterium salinarium NRC-1).We find that MeDiChI accurately predicts binding locations toa resolution greater than that of the probe spacing, even foroverlapping peaks, and can increase the effective resolutionof tiling array data by a factor of 5x or better. Moreover,the method's performance on simulated data provides insightsinto effectively optimizing the experimental design for increasedbinding site localization accuracy and efficacy. Availability: MeDiChI is available as an open-source R package,including all data, from http://baliga.systemsbiology.net/medichi. Contact: dreiss{at}systemsbiology.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

5.
Motivation: High-throughput experimental and computational methodsare generating a wealth of protein–protein interactiondata for a variety of organisms. However, data produced by currentstate-of-the-art methods include many false positives, whichcan hinder the analyses needed to derive biological insights.One way to address this problem is to assign confidence scoresthat reflect the reliability and biological significance ofeach interaction. Most previously described scoring methodsuse a set of likely true positives to train a model to scoreall interactions in a dataset. A single positive training set,however, may be biased and not representative of true interactionspace. Results: We demonstrate a method to score protein interactionsby utilizing multiple independent sets of training positivesto reduce the potential bias inherent in using a single trainingset. We used a set of benchmark yeast protein interactions toshow that our approach outperforms other scoring methods. Ourapproach can also score interactions across data types, whichmakes it more widely applicable than many previously proposedmethods. We applied the method to protein interaction data fromboth Drosophila melanogaster and Homo sapiens. Independent evaluationsshow that the resulting confidence scores accurately reflectthe biological significance of the interactions. Contact: rfinley{at}wayne.edu Supplementary information: Supplementary data are availableat Bioinformatics Online. Associate Editor: Burkhard Rost  相似文献   

6.
Motivation: We propose a Bayesian method for the problem ofmultiple hypothesis testing that is routinely encountered inbioinformatics research, such as the differential gene expressionanalysis. Our algorithm is based on modeling the distributionsof test statistics under both null and alternative hypotheses.We substantially reduce the complexity of the process of definingposterior model probabilities by modeling the test statisticsdirectly instead of modeling the full data. Computationally,we apply a Bayesian FDR approach to control the number of rejectionsof null hypotheses. To check if our model assumptions for thetest statistics are valid for various bioinformatics experiments,we also propose a simple graphical model-assessment tool. Results: Using extensive simulations, we demonstrate the performanceof our models and the utility of the model-assessment tool.In the end, we apply the proposed methodology to an siRNA screeningand a gene expression experiment. Contact: yuanji{at}mdanderson.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Chris Stoeckert  相似文献   

7.
Motivation: The success of genome sequencing has resulted inmany protein sequences without functional annotation. We presentConFunc, an automated Gene Ontology (GO)-based protein functionprediction approach, which uses conserved residues to generatesequence profiles to infer function. ConFunc split sets of sequencesidentified by PSI-BLAST into sub-alignments according to theirGO annotations. Conserved residues are identified for each GOterm sub-alignment for which a position specific scoring matrixis generated. This combination of steps produces a set of feature(GO annotation) derived profiles from which protein functionis predicted. Results: We assess the ability of ConFunc, BLAST and PSI-BLASTto predict protein function in the twilight zone of sequencesimilarity. ConFunc significantly outperforms BLAST & PSI-BLASTobtaining levels of recall and precision that are not obtainedby either method and maximum precision 24% greater than BLAST.Further for a large test set of sequences with homologues oflow sequence identity, at high levels of presicision, ConFuncobtains recall six times greater than BLAST. These results demonstratethe potential for ConFunc to form part of an automated genomicsannotation pipeline. Availability: http://www.sbg.bio.ic.ac.uk/confunc Contact: m.sternberg{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Dmitrij Frishman  相似文献   

8.
9.
Motivation: Coexpression networks have recently emerged as anovel holistic approach to microarray data analysis and interpretation.Choosing an appropriate cutoff threshold, above which a gene–geneinteraction is considered as relevant, is a critical task inmost network-centric applications, especially when two or morenetworks are being compared. Results: We demonstrate that the performance of traditionalapproaches, which are based on a pre-defined cutoff or significancelevel, can vary drastically depending on the type of data andapplication. Therefore, we introduce a systematic procedurefor estimating a cutoff threshold of coexpression networks directlyfrom their topological properties. Both synthetic and real datasetsshow clear benefits of our data-driven approach under variouspractical circumstances. In particular, the procedure providesa robust estimate of individual degree distributions, even frommultiple microarray studies performed with different array platformsor experimental designs, which can be used to discriminate thecorresponding phenotypes. Application to human T helper celldifferentiation process provides useful insights into the componentsand interactions controlling this process, many of which wouldhave remained unidentified on the basis of expression changealone. Moreover, several human–mouse orthologs showedconserved topological changes in both systems, suggesting theirpotential importance in the differentiation process. Contact: laliel{at}utu.fi Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: David Rocke  相似文献   

10.
Motivation: In searching for differentially expressed (DE) genesin microarray data, we often observe a fraction of the genesto have unequal variability between groups. This is not an issuein large samples, where a valid test exists that uses individualvariances separately. The problem arises in the small-samplesetting, where the approximately valid Welch test lacks sensitivity,while the more sensitive moderated t-test assumes equal variance. Methods: We introduce a moderated Welch test (MWT) that allowsunequal variance between groups. It is based on (i) weightingof pooled and unpooled standard errors and (ii) improved estimationof the gene-level variance that exploits the information fromacross the genes. Results: When a non-trivial proportion of genes has unequalvariability, false discovery rate (FDR) estimates based on thestandard t and moderated t-tests are often too optimistic, whilethe standard Welch test has low sensitivity. The MWT is shownto (i) perform better than the standard t, the standard Welchand the moderated t-tests when the variances are unequal betweengroups and (ii) perform similarly to the moderated t, and betterthan the standard t and Welch tests when the group variancesare equal. These results mean that MWT is more reliable thanother existing tests over wider range of data conditions. Availability: R package to perform MWT is available at http://www.meb.ki.se/~yudpaw Contact: yudi.pawitan{at}ki.se Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

11.
12.
Motivation: The quest for high-throughput proteomics has revealeda number of challenges in recent years. Whilst substantial improvementsin automated protein separation with liquid chromatography andmass spectrometry (LC/MS), aka ‘shotgun’ proteomics,have been achieved, large-scale open initiatives such as theHuman Proteome Organization (HUPO) Brain Proteome Project haveshown that maximal proteome coverage is only possible when LC/MSis complemented by 2D gel electrophoresis (2-DE) studies. Moreover,both separation methods require automated alignment and differentialanalysis to relieve the bioinformatics bottleneck and so makehigh-throughput protein biomarker discovery a reality. The purposeof this article is to describe a fully automatic image alignmentframework for the integration of 2-DE into a high-throughputdifferential expression proteomics pipeline. Results: The proposed method is based on robust automated imagenormalization (RAIN) to circumvent the drawbacks of traditionalapproaches. These use symbolic representation at the very earlystages of the analysis, which introduces persistent errors dueto inaccuracies in modelling and alignment. In RAIN, a third-ordervolume-invariant B-spline model is incorporated into a multi-resolutionschema to correct for geometric and expression inhomogeneityat multiple scales. The normalized images can then be compareddirectly in the image domain for quantitative differential analysis.Through evaluation against an existing state-of-the-art methodon real and synthetically warped 2D gels, the proposed analysisframework demonstrates substantial improvements in matchingaccuracy and differential sensitivity. High-throughput analysisis established through an accelerated GPGPU (general purposecomputation on graphics cards) implementation. Availability: Supplementary material, software and images usedin the validation are available at http://www.proteomegrid.org/rain/ Contact: g.z.yang{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: David Rocke  相似文献   

13.
Motivation: After 10-year investigations, the folding mechanismsof β-hairpins are still under debate. Experiments stronglysupport zip-out pathway, while most simulations prefer the hydrophobiccollapse model (including middle-out and zip-in pathways). Inthis article, we show that all pathways can occur during thefolding of β-hairpins but with different probabilities.The zip-out pathway is the most probable one. This is in agreementwith the experimental results. We came to our conclusions by38 100-ns room-temperature all-atom molecular dynamics simulationsof the β-hairpin trpzip2. Our results may help to clarifythe inconsistencies in the current pictures of β-hairpinfolding mechanisms. Contact: yxiao{at}mail.hust.edu.cn Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Anna Tramontano  相似文献   

14.
Motivation: Mass spectrometry data are subjected to considerablenoise. Good noise models are required for proper detection andquantification of peptides. We have characterized noise in bothquadrupole time-of-flight (Q-TOF) and ion trap data, and haveconstructed models for the noise. Results: We find that the noise in Q-TOF data from Applied BiosystemsQSTAR fits well to a combination of multinomial and Poissonmodel with detector dead-time correction. In comparison, iontrap noise from Agilent MSD-Trap-SL is larger than the Q-TOFnoise and is proportional to Poisson noise. We then demonstratethat the noise model can be used to improve deisotoping forpeptide detection, by estimating appropriate cutoffs of thegoodness of fit parameter at prescribed error rates. The noisemodels also have implications in noise reduction, retentiontime alignment and significance testing for biomarker discovery. Contact: pdu{at}us.ibm.com Supplementary information: Supplementary data are availableat Bioinfomatics Online. Associate Editor: Olga Troyanskaya  相似文献   

15.
The ability to rank proteins by their likely success in crystallizationis useful in current Structural Biology efforts and in particularin high-throughput Structural Genomics initiatives. We presentParCrys, a Parzen Window approach to estimate a protein's propensityto produce diffraction-quality crystals. The Protein Data Bank(PDB) provided training data whilst the databases TargetDB andPepcDB were used to define feature selection data as well astest data independent of feature selection and training. ParCrysoutperforms the OB-Score, SECRET and CRYSTALP on the data examined,with accuracy and Matthews correlation coefficient values of79.1% and 0.582, respectively (74.0% and 0.227, respectively,on data with a ‘real-world’ ratio of positive:negativeexamples). ParCrys predictions and associated data are availablefrom www.compbio.dundee.ac.uk/parcrys. Contact: geoff{at}compbio.dundee.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: John Quackenbush  相似文献   

16.
17.
Motivation: Inferring population structures using genetic datasampled from a group of individuals is a challenging task. Manymethods either consider a fixed population number or ignorethe correlation between populations. As a result, they can losesensitivity and specificity in detecting subtle stratifications.In addition, when a large number of genetic markers are used,many existing algorithms perform rather inefficiently. Result: We propose a new Bayesian method to infer populationstructures using multiple unlinked single nucleotide polymorphisms(SNPs). Our approach explicitly considers the population correlationthrough a tree hierarchy, and treat the population number asa random variable. Using both simulated and real datasets ofworldwide samples, we demonstrate that an incorporated treecan consistently improve the power in detecting subtle populationstratifications. A tree-based model often involves a large numberof unknown parameters, and the corresponding estimation procedurecan be highly inefficient. We further implement a partitionmethod to analytically integrate out all nuisance parametersin the tree. As a result, our method can analyze large SNP datasetswith significantly improved convergence rate. Availability: http://www.stat.psu.edu/~yuzhang/tips.tar Contact: yuzhang{at}stat.psu.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Keith Crandall  相似文献   

18.
Summary: Using literature databases one can find not only knownand true relations between processes but also less studied,non-obvious associations. The main problem with discoveringsuch type of relevant biological information is ‘selection’.The ability to distinguish between a true correlation (e.g.between different types of biological processes) and randomchance that this correlation is statistically significant iscrucial for any bio-medical research, literature mining beingno exception. This problem is especially visible when searchingfor information which has not been studied and described inmany publications. Therefore, a novel bio-linguistic statisticalmethod is required, capable of ‘selecting’ truecorrelations, even when they are low-frequency associations.In this article, we present such statistical approach basedon Z-score and implemented in a web-based application ‘e-LiSe’. Availability: The software is available at http://miron.ibb.waw.pl/elise/ Contact: piotr{at}ibb.waw.pl Supplementary information: Supplementary materials are availableat http://miron.ibb.waw.pl/elise/supplementary/ Associate Editor: Alfonso Valencia  相似文献   

19.
Summary: FAMHAP is an established software for haplotype associationanalysis of nuclear families. We have released a major updatethat comprises various new features for case-control data. Furthermore,weprovide an additional program runFamhap that allows usersto start the same method repeatedly for varying sets of geneticmarkers. In addition, a platform-independent graphical userinterface (GUI) was developed to simplify the usage of bothFAMHAP and runFamhap. The runFamhap program greatly facilitatesthe application of FAMHAP to genome-wide association studies(GWAS) and supports flexible genome-wide haplotype analysis.As an example, we describe application to HapMap data. Availability: The software is available at http://famhap.meb.uni-bonn.de Contact: herold{at}imbie.meb.uni-bonn.de; becker{at}imbie.meb.uni-bonn.de Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Alex Bateman  相似文献   

20.
DNA methylation increases throughout Arabidopsis development   总被引:9,自引:0,他引:9  
We used amplified fragment length polymorphisms (AFLP) to analyze the stability of DNA methylation throughout Arabidopsis development. AFLP can detect genome-wide changes in cytosine methylation produced by DNA demethylation agents, such as 5-azacytidine, or specific mutations at the DDM1 locus. In both cases, cytosine demethylation is associated with a general increase in the presence of amplified fragments. Using this approach, we followed DNA methylation at methylation sensitive restriction sites throughout Arabidopsis development. The results show a progressive DNA methylation trend from cotyledons to vegetative organs to reproductive organs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号