首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary: BLISS 2.0 is a web-based application for identifyingconserved regulatory modules in distantly related orthologoussequences. Unlike existing approaches, it performs the cross-genomecomparison at the binding site level. Experimental results onsimulated and real world data indicate that BLISS 2.0 can identifyconserved regulatory modules from sequences with little overallsimilarity at the DNA sequence level. Availability: http://www.blisstool.org/ Contact: leizhou{at}ufl.edu Associate Editor: Olga Troyanskaya  相似文献   

2.
Post-processing of BLAST results using databases of clustered sequences   总被引:1,自引:0,他引:1  
Motivation: When evaluating the results of a sequence similaritysearch, there are many situations where it can be useful todetermine whether sequences appearing in the results share somedistinguishing characteristic. Such dependencies between databaseentries are often not readily identifiable, but can yield importantnew insights into the biological function of a gene or protein. Results: We have developed a program called CBLAST that sortsthe results of a BLAST sequence similarity search accordingto sequence membership in user-defined ‘clusters’of sequences. To demonstrate the utility of this application,we have constructed two cluster databases. The first describesclusters of nucleotide sequences representing the same gene,as documented in the UNIGENE database, and the second describesclusters of protein sequences which are members of the proteinfamilies documented in the PROSITE database. Cluster databasesand the CBLAST post-processor provide an efficient mechanismfor identifying and exploring relationships and dependenciesbetween new sequences and database entries. Availability: The software described in this article is availablefree of charge from the EBI software archive at < ftp: //ftp.ebi. ac. uk/pub/software/unix >. Contact: E-mail: rainer _fuchs@glaxowellcome.com  相似文献   

3.
Summary: Cross-mapping of gene and protein identifiers betweendifferent databases is a tedious and time-consuming task. Toovercome this, we developed CRONOS, a cross-reference serverthat contains entries from five mammalian organisms presentedby major gene and protein information resources. Sequence similarityanalysis of the mapped entries shows that the cross-referencesare highly accurate. In total, up to 18 different identifiertypes can be used for identification of cross-references. Thequality of the mapping could be improved substantially by exclusionof ambiguous gene and protein names which were manually validated.Organism-specific lists of ambiguous terms, which are valuablefor a variety of bioinformatics applications like text miningare available for download. Availability: CRONOS is freely available to non-commercial usersat http://mips.gsf.de/genre/proj/cronos/index.html, web servicesare available at http://mips.gsf.de/CronosWSService/CronosWS?wsdl. Contact: brigitte.waegele{at}helmholtz-muenchen.de Supplementary information: Supplementary data are availableat Bioinformatics online. The online Supplementary Materialcontains all figures and tables referenced by this article. Associate Editor: Martin Bishop  相似文献   

4.
Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong  相似文献   

5.
Motivation: The success of genome sequencing has resulted inmany protein sequences without functional annotation. We presentConFunc, an automated Gene Ontology (GO)-based protein functionprediction approach, which uses conserved residues to generatesequence profiles to infer function. ConFunc split sets of sequencesidentified by PSI-BLAST into sub-alignments according to theirGO annotations. Conserved residues are identified for each GOterm sub-alignment for which a position specific scoring matrixis generated. This combination of steps produces a set of feature(GO annotation) derived profiles from which protein functionis predicted. Results: We assess the ability of ConFunc, BLAST and PSI-BLASTto predict protein function in the twilight zone of sequencesimilarity. ConFunc significantly outperforms BLAST & PSI-BLASTobtaining levels of recall and precision that are not obtainedby either method and maximum precision 24% greater than BLAST.Further for a large test set of sequences with homologues oflow sequence identity, at high levels of presicision, ConFuncobtains recall six times greater than BLAST. These results demonstratethe potential for ConFunc to form part of an automated genomicsannotation pipeline. Availability: http://www.sbg.bio.ic.ac.uk/confunc Contact: m.sternberg{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Dmitrij Frishman  相似文献   

6.
7.
Motivation: Genomes contain biologically significant informationthat extends beyond that encoded in genes. Some of this informationrelates to various short dispersed repeats distributed throughoutthe genome. The goal of this work was to combine tools for detectionof statistically significant dispersed repeats in DNA sequenceswith tools to aid development of hypotheses regarding theirpossible physiological functions in an easy-to-use web-basedenvironment. Results: Ab Initio Motif Identification Environment (AIMIE)was designed to facilitate investigations of dispersed sequencemotifs in prokaryotic genomes. We used AIMIE to analyze theEscherichia coli and Haemophilus influenzae genomes in orderto demonstrate the utility of the new environment. AIMIE detectedrepeated extragenic palindrome (REP) elements, CRISPR repeats,uptake signal sequences, intergenic dyad sequences and severalother over-represented sequence motifs. Distributional patternsof these motifs were analyzed using the tools included in AIMIE. Availability: AIMIE and the related software can be accessedat our web site http://www.cmbl.uga.edu/software.html. Contact: mrazek{at}uga.edu Associate Editor: Alex Bateman  相似文献   

8.
Summary: DeconMSn accurately determines the monoisotopic massand charge state of parent ions from high-resolution tandemmass spectrometry data, offering significant improvement forLTQ_FT and LTQ_Orbitrap instruments over the commercially deliveredThermo Fisher Scientific's extract_msn tool. Optimal parention mass tolerance values can be determined using accurate massinformation, thus improving peptide identifications for high-massmeasurement accuracy experiments. For low-resolution data fromLCQ and LTQ instruments, DeconMSn incorporates a support-vector-machine-basedcharge detection algorithm that identifies the most likely chargeof a parent species through peak characteristics of its fragmentationpattern. Availability: http://ncrr.pnl.gov/software/ or http://www.proteomicsresource.org/ Contact: rds{at}pnl.gov Supplementary information: PowerPoint presentation/Poster onhttp://ncrr.pnl.gov/software/. Associate Editor: Alfonso Valencia  相似文献   

9.
Motivation: We present an algorithm to identify allelic variationgiven a Whole Genome Shotgun (WGS) assembly of haploid sequences,and to produce a set of haploid consensus sequences rather thana single consensus sequence. Existing WGS assemblers take acolumn-by-column approach to consensus generation, and producea single consensus sequence which can be inconsistent with theunderlying haploid alleles, and inconsistent with any of thealigned sequence reads. Our new algorithm uses a dynamic windowingapproach. It detects alleles by simultaneously processing theportions of aligned reads spanning a region of sequence variation,assigns reads to their respective alleles, phases adjacent variantalleles and generates a consensus sequence corresponding toeach confirmed allele. This algorithm was used to produce thefirst diploid genome sequence of an individual human. It canalso be applied to assemblies of multiple diploid individualsand hybrid assemblies of multiple haploid organisms. Results: Being applied to the individual human genome assembly,the new algorithm detects exactly two confirmed alleles andreports two consensus sequences in 98.98% of the total number2 033 311 detected regions of sequence variation. In 33 269out of 460 373 detected regions of size >1 bp, it fixes theconstructed errors of a mosaic haploid representation of a diploidlocus as produced by the original Celera Assembler consensusalgorithm. Using an optimized procedure calibrated against 1506 344 known SNPs, it detects 438 814 new heterozygous SNPswith false positive rate 12%. Availability: The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/ Contact: gdenisov{at}jcvi.org Associate Editor: John Quackenbush  相似文献   

10.
11.
12.
Summary: Automated analysis of flow cytometry (FCM) data isessential for it to become successful as a high throughput technology.We believe that the principles of Trellis graphics can be adaptedto provide useful visualizations that can aid such automation.In this article, we describe the R/Bioconductor package flowVizthat implements such visualizations. Availability: flowViz is available as an R package from theBioconductor project: http://bioconductor.org Contact: dsarkar{at}fhcrc.org Associate Editor: Olga Troyanskaya  相似文献   

13.
14.
Motivation: As the use of microarrays in human studies continuesto increase, stringent quality assurance is necessary to ensureaccurate experimental interpretation. We present a formal approachfor microarray quality assessment that is based on dimensionreduction of established measures of signal and noise componentsof expression followed by parametric multivariate outlier testing. Results: We applied our approach to several data resources.First, as a negative control, we found that the Affymetrix andIllumina contributions to MAQC data were free from outliersat a nominal outlier flagging rate of =0.01. Second, we createda tunable framework for artificially corrupting intensity datafrom the Affymetrix Latin Square spike-in experiment to allowinvestigation of sensitivity and specificity of quality assurance(QA) criteria. Third, we applied the procedure to 507 Affymetrixmicroarray GeneChips processed with RNA from human peripheralblood samples. We show that exclusion of arrays by this approachsubstantially increases inferential power, or the ability todetect differential expression, in large clinical studies. Availability: http://bioconductor.org/packages/2.3/bioc/html/arrayMvout.htmland http://bioconductor.org/packages/2.3/bioc/html/affyContam.htmlaffyContam (credentials: readonly/readonly) Contact: aasare{at}immunetolerance.org; stvjc{at}channing.harvard.edu The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. Associate Editor: Trey Ideker  相似文献   

15.
GENOME: a rapid coalescent-based whole genome simulator   总被引:1,自引:0,他引:1  
Summary: GENOME proposes a rapid coalescent-based approach tosimulate whole genome data. In addition to features of standardcoalescent simulators, the program allows for recombinationrates to vary along the genome and for flexible population histories.Within small regions, we have evaluated samples simulated byGENOME to verify that GENOME provides the expected LD patternsand frequency spectra. The program can be used to study thesampling properties of any statistic for a whole genome study. Availability: The program and C++ source code are availableonline at http://www.sph.umich.edu/csg/liang/genome/ Contact: lianglim{at}umich.edu Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Martin Bishop  相似文献   

16.
Motivation: Understanding the complexity in gene–phenotyperelationship is vital for revealing the genetic basis of commondiseases. Recent studies on the basis of human interactome andphenome not only uncovers prevalent phenotypic overlap and geneticoverlap between diseases, but also reveals a modular organizationof the genetic landscape of human diseases, providing new opportunitiesto reduce the complexity in dissecting the gene–phenotypeassociation. Results: We provide systematic and quantitative evidence thatphenotypic overlap implies genetic overlap. With these results,we perform the first heterogeneous alignment of human interactomeand phenome via a network alignment technique and identify 39disease families with corresponding causative gene networks.Finally, we propose AlignPI, an alignment-based framework topredict disease genes, and identify plausible candidates for70 diseases. Our method scales well to the whole genome, asdemonstrated by prioritizing 6154 genes across 37 chromosomeregions for Crohn's disease (CD). Results are consistent witha recent meta-analysis of genome-wide association studies forCD. Availability: Bi-modules and disease gene predictions are freelyavailable at the URL http://bioinfo.au.tsinghua.edu.cn/alignpi/ Contact: ruijiang{at}tsinghua.edu.cn Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Trey Ideker  相似文献   

17.
Motivation: High-density DNA microarrays provide us with usefultools for analyzing DNA and RNA comprehensively. However, thebackground signal caused by the non-specific binding (NSB) betweenprobe and target makes it difficult to obtain accurate measurements.To remove the background signal, there is a set of backgroundprobes on Affymetrix Exon arrays to represent the amount ofnon-specific signals, and an accurate estimation of non-specificsignals using these background probes is desirable for improvementof microarray analyses. Results: We developed a thermodynamic model of NSB on shortnucleotide microarrays in which the NSBs are modeled by duplexformation of probes and multiple hypothetical targets. We fittedthe observed signal intensities of the background probes withthose expected by the model to obtain the model parameters.As a result, we found that the presented model can improve theaccuracy of prediction of non-specific signals in comparisonwith previously proposed methods. This result will provide auseful method to correct for the background signal in oligonucleotidemicroarray analysis. Availability: The software is implemented in the R languageand can be downloaded from our website (http://www-shimizu.ist.osaka-u.ac.jp/shimizu_lab/MSNS/). Contact: furusawa{at}ist.osaka-u.ac.jp Supplementary information: Supplementary data are availableat Bioinformatics online. The authors wish it to be known that, in their opinion, thefirst two authors should be regarded as joint First Authors. Associate Editor: Trey Ideker  相似文献   

18.
Accurate anchoring alignment of divergent sequences   总被引:1,自引:0,他引:1  
  相似文献   

19.
Summary: Selection of optimal biomarkers for the identificationof different operational taxonomic units (OTUs) may be a hardand tedious task, especially when phylogenetic trees for multiplegenes need to be compared. With TaxonGap we present a noveland easy-to-handle software tool that allows visual comparisonof the discriminative power of multiple biomarkers for a setof OTUs. The compact graphical output allows for easy comparisonand selection of individual biomarkers. Availability: Graphical User Interface; Executable JAVA archivefile, source code, supplementary information and sample filescan be downloaded from the website: http://www.kermit.ugent.be/taxongap Contact: Bram.Slabbinck{at}UGent.be Associate Editor: John Quackenbush  相似文献   

20.
Summary: ROBIN is a web server for analyzing genome rearrangementof block-interchanges between two chromosomal genomes. It takestwo or more linear/circular chromosomes as its input, and computesthe number of minimum block-interchange rearrangements betweenany two input chromosomes for transforming one chromosome intoanother and also determines an optimal scenario taking thisnumber of rearrangements. The input can be either bacterial-sizesequence data or landmark-order data. If the input is sequencedata, ROBIN will automatically search for the identical landmarksthat are the homologous/conserved regions shared by all theinput sequences. Availability: ROBIN is freely accessed at http://genome.life.nctu.edu.tw/ROBIN Contact: cllu{at}mail.nctu.edu.tw  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号