期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Taxonomic markup language: applying XML to systematic data 总被引：1，自引：0，他引：1

Gilmour R 《Bioinformatics (Oxford, England)》2000,16(4):406-407

相似文献

2.

CellML2SBML: conversion of CellML into SBML

Schilstra MJ Li L Matthews J Finney A Hucka M Le Novère N 《Bioinformatics (Oxford, England)》2006,22(8):1018-1020

CellML and SBML are XML-based languages for storage and exchange of molecular biological and physiological reaction models. They use very similar subsets of MathML to specify the mathematical aspects of the models. CellML2SBML is implemented as a suite of XSLT stylesheets that, when applied consecutively, convert models expressed in CellML into SBML without significant loss of information. The converter is based on the most recent stable versions of the languages (CellML version 1.1; SBML Level 2 Version 1), and the XSLT used in the stylesheets adheres to the XSLT version 1.0 specification. Of all 306 models in the CellML repository in April 2005, CellML2SBML converted 91% automatically into SBML. Minor manual changes to the unit definitions in the originals raised the percentage of successful conversions to 96%. Availability: http://sbml.org/software/cellml2sbml/. Supplementary information: Instructions for use and further documentation available on http://sbml.org/software/cellml2sbml/ 相似文献

3.

A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction 总被引：2，自引：0，他引：2

Newberg LA Thompson WA Conlan S Smith TM McCue LA Lawrence CE 《Bioinformatics (Oxford, England)》2007,23(14):1718-1727

MOTIVATION: Identification of functionally conserved regulatory elements in sequence data from closely related organisms is becoming feasible, due to the rapid growth of public sequence databases. Closely related organisms are most likely to have common regulatory motifs; however, the recent speciation of such organisms results in the high degree of correlation in their genome sequences, confounding the detection of functional elements. Additionally, alignment algorithms that use optimization techniques are limited to the detection of a single alignment that may not be representative. Comparative-genomics studies must be able to address the phylogenetic correlation in the data and efficiently explore the alignment space, in order to make specific and biologically relevant predictions. RESULTS: We describe here a Gibbs sampler that employs a full phylogenetic model and reports an ensemble centroid solution. We describe regulatory motif detection using both simulated and real data, and demonstrate that this approach achieves improved specificity, sensitivity, and positive predictive value over non-phylogenetic algorithms, and over phylogenetic algorithms that report a maximum likelihood solution. AVAILABILITY: The software is freely available at http://bayesweb.wadsworth.org/gibbs/gibbs.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

4.

An integrated software system for analyzing ChIP-chip and ChIP-seq data 总被引：1，自引：0，他引：1

Ji H Jiang H Ma W Johnson DS Myers RM Wong WH 《Nature biotechnology》2008,26(11):1293-1300

相似文献

5.

PASS: A Proteomics Alternative Splicing Screening Pipeline

Peng Wu Lingling Pu Bingnan Deng Yingying Li Zhaoli Chen Weili Liu 《Proteomics》2019,19(13)

相似文献

6.

Finding motifs in the twilight zone 总被引：8，自引：0，他引：8

Keich U Pevzner PA 《Bioinformatics (Oxford, England)》2002,18(10):1374-1381

相似文献

7.

Discovering motifs in ranked lists of DNA sequences

下载免费PDF全文

Eden E Lipson D Yogev S Yakhini Z 《PLoS computational biology》2007,3(3):e39

相似文献

8.

Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands 总被引：4，自引：0，他引：4

Vernikos GS Parkhill J 《Bioinformatics (Oxford, England)》2006,22(18):2196-2203

MOTIVATION: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes. RESULTS: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events. AVAILABILITY: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter CONTACT: gsv@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

9.

Computational sequence analysis revisited: new databases, software tools, and the research opportunities they engender.

M S Boguski 《Journal of lipid research》1992,33(7):957-974

The increasing quantity and complexity of sequences and structural data for proteins and nucleic acids create both problems and opportunities for biomedical researchers. Fortunately, a new generation of practical computer tools for data analysis and integrated information retrieval is emerging. Recent developments in fast database searching, multiple sequence alignment, and molecular modeling are discussed and windows-based, mouse-driven software for CD-ROM and network information retrieval are described. Each method is illustrated with a practical example pertinent to lipid research. In particular, the connection among cholesteryl ester transfer protein, bactericidal permeability-increasing protein, and lipopolysaccharide-binding proteins is determined; novel repetitive sequence motifs in mammalian farnesyltransferase subunits and related yeast prenyltransferases are derived; biochemical insights from a three-dimensional model of human apolipoprotein D based on two insect lipocalins are discussed; the relationship between apolipoprotein D and gross cystic disease fluid protein from human breast is reviewed; and prospects for modeling apolipoprotein E-related proteins are described. In addition, information on a number of general and special-purpose sequence, motif, and structural databases is included. 相似文献

10.

Discriminative motif finding for predicting protein subcellular localization

Lin TH Murphy RF Bar-Joseph Z 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(2):441-451

Many methods have been described to predict the subcellular location of proteins from sequence information. However, most of these methods either rely on global sequence properties or use a set of known protein targeting motifs to predict protein localization. Here, we develop and test a novel method that identifies potential targeting motifs using a discriminative approach based on hidden Markov models (discriminative HMMs). These models search for motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism. We show that both discriminative motif finding and the hierarchical structure improve localization prediction on a benchmark data set of yeast proteins. The motifs identified can be mapped to known targeting motifs and they are more conserved than the average protein sequence. Using our motif-based predictions, we can identify potential annotation errors in public databases for the location of some of the proteins. A software implementation and the data set described in this paper are available from http://murphylab.web.cmu.edu/software/2009_TCBB_motif/. 相似文献

11.

Identifying novel sequence variants of RNA 3D motifs

Craig L. Zirbel James Roll Blake A. Sweeney Anton I. Petrov Meg Pirrung Neocles B. Leontis 《Nucleic acids research》2015,43(15):7504-7520

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. 相似文献

12.

Logos: a modular bayesian model for de novo motif detection

Xing EP Wu W Jordan MI Karp RM 《Journal of bioinformatics and computational biology》2004,2(1):127-154

The complexity of the global organization and internal structure of motifs in higher eukaryotic organisms raises significant challenges for motif detection techniques. To achieve successful de novo motif detection, it is necessary to model the complex dependencies within and among motifs and to incorporate biological prior knowledge. In this paper, we present LOGOS, an integrated LOcal and GlObal motif Sequence model for biopolymer sequences, which provides a principled framework for developing, modularizing, extending and computing expressive motif models for complex biopolymer sequence analysis. LOGOS consists of two interacting submodels: HMDM, a local alignment model capturing biological prior knowledge and positional dependency within the motif local structure; and HMM, a global motif distribution model modeling frequencies and dependencies of motif occurrences. Model parameters can be fit using training motifs within an empirical Bayesian framework. A variational EM algorithm is developed for de novo motif detection. LOGOS improves over existing models that ignore biological priors and dependencies in motif structures and motif occurrences, and demonstrates superior performance on both semi-realistic test data and cis-regulatory sequences from yeast and Drosophila genomes with regard to sensitivity, specificity, flexibility and extensibility. 相似文献

13.

FIMO: scanning for occurrences of a given motif

Grant CE Bailey TL Noble WS 《Bioinformatics (Oxford, England)》2011,27(7):1017-1018

A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. RESULTS: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at http://meme.sdsc.edu. 相似文献

14.

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments

Frickey T Weiller G 《Bioinformatics (Oxford, England)》2007,23(4):502-503

A multitude of motif-finding tools have been published, which can generally be assigned to one of three classes: expectation-maximization, Gibbs-sampling or enumeration. Irrespective of this grouping, most motif detection tools only take into account similarities across ungapped sequence regions, possibly causing short motifs located peripherally and in varying distance to a 'core' motif to be missed. We present a new method, adding to the set of expectation-maximization approaches, that permits the use of gapped alignments for motif elucidation. Availability: The program is available for download from: http://bioinfoserver.rsbs.anu.edu.au/downloads/mclip.jar. Supplementary information: http://bioinfoserver.rsbs.anu.edu.au/utils/mclip/info.php. 相似文献

15.

bgc: Software for Bayesian estimation of genomic clines

Z. Gompert C. A. Buerkle 《Molecular ecology resources》2012,12(6):1168-1176

Introgression in admixed populations can be used to identify candidate loci that might underlie adaptation or reproductive isolation. The Bayesian genomic cline model provides a framework for quantifying variable introgression in admixed populations and identifying regions of the genome with extreme introgression that are potentially associated with variation in fitness. Here we describe the bgc software, which uses Markov chain Monte Carlo to estimate the joint posterior probability distribution of the parameters in the Bayesian genomic cline model and designate outlier loci. This software can be used with next‐generation sequence data, accounts for uncertainty in genotypic state, and can incorporate information from linked loci on a genetic map. Output from the analysis is written to an HDF5 file for efficient storage and manipulation. This software is written in C++ . The source code, software manual, compilation instructions and example data sets are available under the GNU Public License at http://sites.google.com/site/bgcsoftware/ . 相似文献

16.

Sequence-based classification using discriminatory motif feature selection

Xiong H Capurso D Sen S Segal MR 《PloS one》2011,6(11):e27382

Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k) predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http://www.epibiostat.ucsf.edu/biostat/sen/dmfs/. 相似文献

17.

Learning cellular sorting pathways using protein interactions and sequence motifs

Lin TH Bar-Joseph Z Murphy RF 《Journal of computational biology》2011,18(11):1709-1722

Proper subcellular localization is critical for proteins to perform their roles in cellular functions. Proteins are transported by different cellular sorting pathways, some of which take a protein through several intermediate locations until reaching its final destination. The pathway a protein is transported through is determined by carrier proteins that bind to specific sequence motifs. In this article, we present a new method that integrates protein interaction and sequence motif data to model how proteins are sorted through these sorting pathways. We use a hidden Markov model (HMM) to represent protein sorting pathways. The model is able to determine intermediate sorting states and to assign carrier proteins and motifs to the sorting pathways. In simulation studies, we show that the method can accurately recover an underlying sorting model. Using data for yeast, we show that our model leads to accurate prediction of subcellular localization. We also show that the pathways learned by our model recover many known sorting pathways and correctly assign proteins to the path they utilize. The learned model identified new pathways and their putative carriers and motifs and these may represent novel protein sorting mechanisms. Supplementary results and software implementation are available from http://murphylab.web.cmu.edu/software/2010_RECOMB_pathways/. 相似文献

18.

A feature-based approach to modeling protein-DNA interactions

Sharon E Lubliner S Segal E 《PLoS computational biology》2008,4(8):e1000154

相似文献

19.

STAR: an algorithm to Search for Tandem Approximate Repeats

Delgrange O Rivals E 《Bioinformatics (Oxford, England)》2004,20(16):2812-2820

相似文献

20.

VisRD--visual recombination detection

Forslund K Huson DH Moulton V 《Bioinformatics (Oxford, England)》2004,20(18):3654-3655

SUMMARY: VisRD, a program for visual recombination detection in a sequence alignment is presented. VisRD is written in Java and is designed to complement the multi-purpose phylogenetic software package SplitsTree4. AVAILABILITY: The software is freely available from http://www.lcb.uu.se/~vmoulton/software/visrd/ 相似文献