首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Motivation: What forces maintain transposable elements (TEs)in genomes and populations is one of the main questions to understandthe dynamics of these elements, but the exact nature of theseforces is still a matter of speculation. To test theoreticalmodels of TE population dynamics, we need many data on the genomicdistributions of various elements. These data are now accumulatingfor the species Drosophila melanogaster, but they are scatteredin the literature. Results: The knowledge base DROSOPOSON thus brings together:(1) data available on Drosophila chromosomal localizations ofTE insertions and on features of the polytene chromosomes (DNAcontent, recombination rate, breakpoints, etc.); (2) statisticalmethods aimed at analysing the distribution of the TE insertionsalong the chromosomes. In this paper, we present the structureof the base, the data and the statistical methods. Theoreticalmodels of containment of TE copy number in Drosophila can thusbe tested. Availability: All the program sources, knowledge base schemesand data are available through anonymous ftp at biom3.univ-lyonl.fr(directory: pub/drosoposon). Contact: E-mail: hoogland{at}biomserv.univ-lyonl.fr  相似文献   

2.
Summary: This work presents two independent approaches for aseamless integration of computational grids with the bioinformaticsworkflow suite Taverna. These are supported by a unique relationaldatabase to link applications with grid resources and presentsthose as workflow elements. A web portal facilitates its collaborativemaintenance. The first approach implements a gateway serviceto handle authentication certificates and all communicationwith the grid. It reads the database to spawn web services forworkflow elements which are in turn used by Taverna. The secondapproach lets Taverna communicate with the grid on its own,by means of a newly developed plug-in. It reads the databaseand executes the needed tasks directly on the grid. While thegateway service is non-intrusive, the plug-in has technicaladvantages, e.g. by allowing data to remain on the grid whilebeing passed between workflow elements. Availability: http://grid.inb.uni-luebeck.de/ Contact: bayer{at}inb.uni-luebeck.de Associate Editor: Alfonso Valencia  相似文献   

3.
Motivation: Genomes contain biologically significant informationthat extends beyond that encoded in genes. Some of this informationrelates to various short dispersed repeats distributed throughoutthe genome. The goal of this work was to combine tools for detectionof statistically significant dispersed repeats in DNA sequenceswith tools to aid development of hypotheses regarding theirpossible physiological functions in an easy-to-use web-basedenvironment. Results: Ab Initio Motif Identification Environment (AIMIE)was designed to facilitate investigations of dispersed sequencemotifs in prokaryotic genomes. We used AIMIE to analyze theEscherichia coli and Haemophilus influenzae genomes in orderto demonstrate the utility of the new environment. AIMIE detectedrepeated extragenic palindrome (REP) elements, CRISPR repeats,uptake signal sequences, intergenic dyad sequences and severalother over-represented sequence motifs. Distributional patternsof these motifs were analyzed using the tools included in AIMIE. Availability: AIMIE and the related software can be accessedat our web site http://www.cmbl.uga.edu/software.html. Contact: mrazek{at}uga.edu Associate Editor: Alex Bateman  相似文献   

4.
When proper statistical procedures were employed, the empiricalCell Quota model of Droop (J. Mar. Biol. Assoc. UK, 48, 689–733,1968; J. Phycol., 9, 264–272, 1973) proved a better fitto 20 out of 21 data sets (of conserved nutrients) than didthe power law-based Chemical Reaction model of Baird (J. PlanktonRes., 21, 85–126, 1999).  相似文献   

5.
Accurate anchoring alignment of divergent sequences   总被引:1,自引:0,他引:1  
  相似文献   

6.
Motivation: High-density DNA microarrays provide us with usefultools for analyzing DNA and RNA comprehensively. However, thebackground signal caused by the non-specific binding (NSB) betweenprobe and target makes it difficult to obtain accurate measurements.To remove the background signal, there is a set of backgroundprobes on Affymetrix Exon arrays to represent the amount ofnon-specific signals, and an accurate estimation of non-specificsignals using these background probes is desirable for improvementof microarray analyses. Results: We developed a thermodynamic model of NSB on shortnucleotide microarrays in which the NSBs are modeled by duplexformation of probes and multiple hypothetical targets. We fittedthe observed signal intensities of the background probes withthose expected by the model to obtain the model parameters.As a result, we found that the presented model can improve theaccuracy of prediction of non-specific signals in comparisonwith previously proposed methods. This result will provide auseful method to correct for the background signal in oligonucleotidemicroarray analysis. Availability: The software is implemented in the R languageand can be downloaded from our website (http://www-shimizu.ist.osaka-u.ac.jp/shimizu_lab/MSNS/). Contact: furusawa{at}ist.osaka-u.ac.jp Supplementary information: Supplementary data are availableat Bioinformatics online. The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. Associate Editor: Trey Ideker  相似文献   

7.
Summary: TOPALi v2 simplifies and automates the use of severalmethods for the evolutionary analysis of multiple sequence alignments.Jobs are submitted from a Java graphical user interface as TOPALiweb services to either run remotely on high-performance computingclusters or locally (with multiple cores supported). Methodsavailable include model selection and phylogenetic tree estimationusing the Bayesian inference and maximum likelihood (ML) approaches,in addition to recombination detection methods. The optimalsubstitution model can be selected for protein or nucleic acid(standard, or protein-coding using a codon position model) datausing accurate statistical criteria derived from ML co-estimationof the tree and the substitution model. Phylogenetic softwareavailable includes PhyML, RAxML and MrBayes. Availability: Freely downloadable from http://www.topali.orgfor Windows, Mac OS X, Linux and Solaris. Contact: iain.milne{at}scri.ac.uk Associate Editor: Martin Bishop  相似文献   

8.
Motivation: Loss of heterozygosity (LOH) is one of the mostimportant mechanisms in the tumor evolution. LOH can be detectedfrom the genotypes of the tumor samples with or without pairednormal samples. In paired sample cases, LOH detection for informativesingle nucleotide polymorphisms (SNPs) is straightforward ifthere is no genotyping error. But genotyping errors are alwaysunavoidable, and there are about 70% non-informative SNPs whoseLOH status can only be inferred from the neighboring informativeSNPs. Results: This article presents a novel LOH inference and segmentationalgorithm based on the conditional random pattern (CRP) model.The new model explicitly considers the distance between twoneighboring SNPs, as well as the genotyping error rate and theheterozygous rate. This new method is tested on the simulatedand real data of the Affymetrix Human Mapping 500K SNP arrays.The experimental results show that the CRP method outperformsthe conventional methods based on the hidden Markov model (HMM). Availability: Software is available upon request. Contact: xzhou{at}tmhs.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Alex Bateman  相似文献   

9.
Summary: Low-complexity, repetitive protein sequences with alimited amino acid palette are abundant in nature, and manyof them play an important role in the structure and functionof certain types of proteins. However, such repetitive sequencesoften do not have rigidly defined motifs. Consequently, theidentification of these low-complexity repetitive elements hasproven challenging for existing pattern-matching algorithms.Here we introduce a new web-tool SubSeqer (http://compsysbio.org/subseqer/)which uses graphical visualization methods borrowed from proteininteraction studies to identify and characterize repetitiveelements in low-complexity sequences. Given their abundance,we suggest that SubSeqer represents a valuable resource forthe study of typically neglected low-complexity sequences. Contact: jparkin{at}sickkids.ca Associate Editor: Limsoon Wong  相似文献   

10.
MMG: a probabilistic tool to identify submodules of metabolic pathways   总被引:1,自引:0,他引:1  
Motivation: A fundamental task in systems biology is the identificationof groups of genes that are involved in the cellular responseto particular signals. At its simplest level, this often reducesto identifying biological quantities (mRNA abundance, enzymeconcentrations, etc.) which are differentially expressed intwo different conditions. Popular approaches involve using t-teststatistics, based on modelling the data as arising from a mixturedistribution. A common assumption of these approaches is thatthe data are independent and identically distributed; however,biological quantities are usually related through a complex(weighted) network of interactions, and often the more pertinentquestion is which subnetworks are differentially expressed,rather than which genes. Furthermore, in many interesting cases(such as high-throughput proteomics and metabolomics), onlyvery partial observations are available, resulting in the needfor efficient imputation techniques. Results: We introduce Mixture Model on Graphs (MMG), a novelprobabilistic model to identify differentially expressed submodulesof biological networks and pathways. The method can easily incorporateinformation about weights in the network, is robust againstmissing data and can be easily generalized to directed networks.We propose an efficient sampling strategy to infer posteriorprobabilities of differential expression, as well as posteriorprobabilities over the model parameters. We assess our methodon artificial data demonstrating significant improvements overstandard mixture model clustering. Analysis of our model resultson quantitative high-throughput proteomic data leads to theidentification of biologically significant subnetworks, as wellas the prediction of the expression level of a number of enzymes,some of which are then verified experimentally. Availability: MATLAB code is available from http://www.dcs.shef.ac.uk/~guido/software.html Contact: guido{at}dcs.shef.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Jonathan Wren  相似文献   

11.
Motivation: We propose a Bayesian method for the problem ofmultiple hypothesis testing that is routinely encountered inbioinformatics research, such as the differential gene expressionanalysis. Our algorithm is based on modeling the distributionsof test statistics under both null and alternative hypotheses.We substantially reduce the complexity of the process of definingposterior model probabilities by modeling the test statisticsdirectly instead of modeling the full data. Computationally,we apply a Bayesian FDR approach to control the number of rejectionsof null hypotheses. To check if our model assumptions for thetest statistics are valid for various bioinformatics experiments,we also propose a simple graphical model-assessment tool. Results: Using extensive simulations, we demonstrate the performanceof our models and the utility of the model-assessment tool.In the end, we apply the proposed methodology to an siRNA screeningand a gene expression experiment. Contact: yuanji{at}mdanderson.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Chris Stoeckert  相似文献   

12.
13.
14.
Flynn and Martin-Jézéquel (J. Plankton Res., 22,447–472, 2000) derived a mechanistic model for nitrogenand silicon physiology of diatoms. During their analysis, theycompared the output of this model with that of the co-nutrientmodel of Davidson and Gurney (J. Plankton Res., 21, 839–858,1999). They highlighted some discrepancies between the predictionsof the two models, which occurred subsequent to exhaustion ofthe yield-limiting nutrient, and suggested that the co-nutrientmodel contained technical inaccuracies in its output. Here itis shown that by simply modifying the numerical values of twoof the parameters of the co-nutrient model, while retainingexactly the same model structure, it is possible to producesimilar dynamics to those exhibited by the model of Flynn andMartin-Jézéquel.  相似文献   

15.
Motivation: Inference of haplotypes from genotype data is crucialand challenging for many vitally important studies. The first,and most critical step, is the ascertainment of a biologicallysound model to be optimized. Many models that have been proposedrely partially or entirely on reducing the number of uniquehaplotypes in the solution. Results: This article examines the parsimony of haplotypes usingknown haplotypes as well as genotypes from the HapMap project.Our study reveals that there are relatively few unique haplotypes,but not always the least possible, for the datasets with knownsolutions. Furthermore, we show that there are frequently verylarge numbers of parsimonious solutions, and the number increasesexponentially with increasing cardinality. Moreover, these solutionsare quite varied, most of which are not consistent with thetrue solutions. These results quantify the limitations of thePure Parsimony model and demonstrate the imperative need toconsider additional properties for haplotype inference models.At a higher level, and with broad applicability, this articleillustrates the power of combinatorial methods to tease outimperfections in a given biological model. Contact: weixiong.zhang{at}wustl.edu Associate Editor: Alex Bateman  相似文献   

16.
17.
Model-based deconvolution of genome-wide DNA binding   总被引:1,自引:0,他引:1  
Motivation: Chromatin immunoprecipitation followed by hybridizationto a genomic tiling microarray (ChIP-chip) is a routinely usedprotocol for localizing the genomic targets of DNA-binding proteins.The resolution to which binding sites in this assay can be identifiedis commonly considered to be limited by two factors: (1) theresolution at which the genomic targets are tiled in the microarrayand (2) the large and variable lengths of the immunoprecipitatedDNA fragments. Results: We have developed a generative model of binding sitesin ChIP-chip data and an approach, MeDiChI, for efficientlyand robustly learning that model from diverse data sets. Wehave evaluated MeDiChI's performance using simulated data, aswell as on several diverse ChIP-chip data sets collected onwidely different tiling array platforms for two different organisms(Saccharomyces cerevisiae and Halobacterium salinarium NRC-1).We find that MeDiChI accurately predicts binding locations toa resolution greater than that of the probe spacing, even foroverlapping peaks, and can increase the effective resolutionof tiling array data by a factor of 5x or better. Moreover,the method's performance on simulated data provides insightsinto effectively optimizing the experimental design for increasedbinding site localization accuracy and efficacy. Availability: MeDiChI is available as an open-source R package,including all data, from http://baliga.systemsbiology.net/medichi. Contact: dreiss{at}systemsbiology.org Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

18.
Motivation: Mass spectrometry data are subjected to considerablenoise. Good noise models are required for proper detection andquantification of peptides. We have characterized noise in bothquadrupole time-of-flight (Q-TOF) and ion trap data, and haveconstructed models for the noise. Results: We find that the noise in Q-TOF data from Applied BiosystemsQSTAR fits well to a combination of multinomial and Poissonmodel with detector dead-time correction. In comparison, iontrap noise from Agilent MSD-Trap-SL is larger than the Q-TOFnoise and is proportional to Poisson noise. We then demonstratethat the noise model can be used to improve deisotoping forpeptide detection, by estimating appropriate cutoffs of thegoodness of fit parameter at prescribed error rates. The noisemodels also have implications in noise reduction, retentiontime alignment and significance testing for biomarker discovery. Contact: pdu{at}us.ibm.com Supplementary information: Supplementary data are availableat Bioinfomatics Online. Associate Editor: Olga Troyanskaya  相似文献   

19.
Motivation: Inferring population structures using genetic datasampled from a group of individuals is a challenging task. Manymethods either consider a fixed population number or ignorethe correlation between populations. As a result, they can losesensitivity and specificity in detecting subtle stratifications.In addition, when a large number of genetic markers are used,many existing algorithms perform rather inefficiently. Result: We propose a new Bayesian method to infer populationstructures using multiple unlinked single nucleotide polymorphisms(SNPs). Our approach explicitly considers the population correlationthrough a tree hierarchy, and treat the population number asa random variable. Using both simulated and real datasets ofworldwide samples, we demonstrate that an incorporated treecan consistently improve the power in detecting subtle populationstratifications. A tree-based model often involves a large numberof unknown parameters, and the corresponding estimation procedurecan be highly inefficient. We further implement a partitionmethod to analytically integrate out all nuisance parametersin the tree. As a result, our method can analyze large SNP datasetswith significantly improved convergence rate. Availability: http://www.stat.psu.edu/~yuzhang/tips.tar Contact: yuzhang{at}stat.psu.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Keith Crandall  相似文献   

20.
Motivation: Reliable structural modelling of protein–proteincomplexes has widespread application, from drug design to advancingour knowledge of protein interactions and function. This workaddresses three important issues in protein–protein docking:implementing backbone flexibility, incorporating prior indicationsfrom experiment and bioinformatics, and providing public accessvia a server. 3D-Garden (Global And Restrained Docking ExplorationNexus), our benchmarked and server-ready flexible docking system,allows sophisticated programming of surface patches by the uservia a facet representation of the interactors’ molecularsurfaces (generated with the marching cubes algorithm). Flexibilityis implemented as a weighted exhaustive conformer search foreach clashing pair of molecular branches in a set of 5000 modelsfiltered from around 340 000 initially. Results: In a non-global assessment, carried out strictly accordingto the protocols for number of models considered and model qualityof the Critical Assessment of Protein Interactions (CAPRI) experiment,over the widely-used Benchmark 2.0 of 84 complexes, 3D-Gardenidentifies a set of ten models containing an acceptable or bettermodel in 29/45 test cases, including one with large conformationalchange. In 19/45 cases an acceptable or better model is rankedfirst or second out of 340 000 candidates. Availability: http://www.sbg.bio.ic.ac.uk/3dgarden (server) Contact: v.lesk{at}ic.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Burkhard Rost  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号