期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BpMatch: an efficient algorithm for a segmental analysis of genomic sequences

Felicioli C Marangoni R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(4):1120-1127

Here, we propose BpMatch: an algorithm that, working on a suitably modified suffix-tree data structure, is able to compute, in a fast and efficient way, the coverage of a source sequence S on a target sequence T, by taking into account direct and reverse segments, eventually overlapped. Using BpMatch, the operator should define a priori, the minimum length l of a segment and the minimum number of occurrences minRep, so that only segments longer than l and having a number of occurrences greater than minRep are considered to be significant. BpMatch outputs the significant segments found and the computed segment-based distance. On the worst case, assuming the alphabet dimension d is a constant, the time required by BpMatch to calculate the coverage is O(l2n). On the average, by setting l ≥ 2 log(d)(n), the time required to calculate the coverage is only O(n). BpMatch, thanks to the minRep parameter, can also be used to perform a self-covering: to cover a sequence using segments coming from itself, by avoiding the trivial solution of having a single segment coincident with the whole sequence. The result of the self-covering approach is a spectral representation of the repeats contained in the sequence. BpMatch is freely available on: www.sourceforge.net/projects/bpmatch. 相似文献

2.

Computational detection of prokaryotic core promoters in genomic sequences

Kim KB Sim JS 《Journal of microbiology (Seoul, Korea)》2005,43(5):411-416

相似文献

3.

EGID: an ensemble algorithm for improved genomic island detection in genomic sequences

Che D Hasan MS Wang H Fazekas J Huang J Liu Q 《Bioinformation》2011,7(6):311-314

Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID. 相似文献

4.

A space-efficient algorithm for aligning large genomic sequences

Morgenstern B 《Bioinformatics (Oxford, England)》2000,16(10):948-949

SUMMARY: In the segment-by-segment approach to sequence alignment, pairwise and multiple alignments are generated by comparing gap-free segments of the sequences under study. This method is particularly efficient in detecting local homologies, and it has been used to identify functional regions in large genomic sequences. Herein, an algorithm is outlined that calculates optimal pairwise segment-by-segment alignments in essentially linear space. AVAILABILTIY: The program is available at the Bielefeld Bioinformatics Server (BiBiServ) at http://bibiserv.techfak. uni-bielefeld.de/dialign/ 相似文献

5.

FASTPAT: a fast and efficient algorithm for string searching in DNA sequences

Nicola Prunella; Liuni Sabino; Attimonelli Marcella; Pasole Graziano 《Bioinformatics (Oxford, England)》1993,9(5):541-545

A new string searching algorithm is presented aimed at searchingfor the occurrence of character patterns in longer charactertexts. The algorithm, specifically designed for nucleic acidsequence data, is essentially derived from the Boyer –Moore method (Comm. ACM, 20, 762 – 772, 1977). Both patternand text data are compressed so that the natural 4-letter alphabetof nucleic acid sequences is considerably enlarged. The stringsearch starts from the last character of the pattern and proceedsin large jumps through the text to be searched. The data compressionand searching algorithm allows one to avoid searching for patternsnot present in the text as well as to inspect, for each pattern,all text characters until the exact match with the text is found.These considerations are supported by empirical evidence andcomparisons with other methods. 相似文献

6.

DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis

Mohammed MH Dutta A Bose T Chadaram S Mande SS 《Bioinformatics (Oxford, England)》2012,28(19):2527-2529

相似文献

7.

CONSORF: a consensus prediction system for prokaryotic coding sequences

Kang S Yang SJ Kim S Bhak J 《Bioinformatics (Oxford, England)》2007,23(22):3088-3090

CONSORF is a fully automatic high-accuracy identification system that provides consensus prokaryotic CDS information. It first predicts the CDSs supported by consensus alignments. The alignments are derived from multiple genome-to-proteome comparisons with other prokaryotes using the FASTX program. Then, it fills the empty genomic regions with the CDSs supported by consensus ab initio predictions. From those consensus results, CONSORF provides prediction reliability scores, predicted frame-shifts, alternative start sites and best pair-wise match information against other prokaryotes. These results are easily accessed from a website. 相似文献

8.

A fast algorithm for joint reconstruction of ancestral amino acid sequences

Pupko T Pe'er I Shamir R Graur D 《Molecular biology and evolution》2000,17(6):890-896

相似文献

9.

CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

Rachid Ounit Steve Wanamaker Timothy J Close Stefano Lonardi 《BMC genomics》2015,16(1)

相似文献

10.

Prophage Finder: a prophage loci prediction tool for prokaryotic genome sequences 总被引：4，自引：0，他引：4

Bose M Barber RD 《In silico biology》2006,6(3):223-227

Prophage loci often remain under-annotated or even unrecognized in prokaryotic genome sequencing projects. A PHP application, Prophage Finder, has been developed and implemented to predict prophage loci, based upon clusters of phage-related gene products encoded within DNA sequences. This application provides results detailing several facets of these clusters to facilitate rapid prediction and analysis of prophage sequences. Prophage Finder was tested using previously annotated prokaryotic genomic sequences with manually curated prophage loci as benchmarks. Additional analyses from Prophage Finder searches of several draft prokaryotic genome sequences are available through the Web site (http://bioinformatics.uwp.edu/~phage/DOEResults.php) to illustrate the potential of this application. 相似文献

11.

A fast algorithm for genome-wide analysis of proteins with repeated sequences.

M Pellegrini E M Marcotte T O Yeates 《Proteins》1999,35(4):440-446

We present a fast algorithm to search for repeating fragments within protein sequences. The technique is based on an extension of the Smith-Waterman algorithm that allows the calculation of sub-optimal alignments of a sequence against itself. We are able to estimate the statistical significance of all sub-optimal alignment scores. We also rapidly determine the length of the repeating fragment and the number of times it is found in a sequence. The technique is applied to sequences in the Swissprot database, and to 16 complete genomes. We find that eukaryotic proteins contain more internal repeats than those of prokaryotic and archael organisms. The finding that 18% of yeast sequences and 28% of the known human sequences contain detectable repeats emphasizes the importance of internal duplication in protein evolution. 相似文献

12.

Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

Liu B Gibbons T Ghodsi M Treangen T Pop M 《BMC genomics》2011,12(Z2):S4

相似文献

13.

Sequence of a transcribed Physarum genomic DNA fragment containing a cluster of different U-RNA sequences.

下载免费PDF全文

M E Curran D S Sullivan E A Arn H B Skinner M W Retter D S Adams 《Nucleic acids research》1988,16(20):9867

相似文献

14.

Statistical measures of the structure of genomic sequences: entropy, complexity, and position information

Orlov YL Te Boekhorst R Abnizova II 《Journal of bioinformatics and computational biology》2006,4(2):523-536

相似文献

15.

EMILiO: a fast algorithm for genome-scale strain design

Yang L Cluett WR Mahadevan R 《Metabolic engineering》2011,13(3):272-281

Systems-level design of cell metabolism is becoming increasingly important for renewable production of fuels, chemicals, and drugs. Computational models are improving in the accuracy and scope of predictions, but are also growing in complexity. Consequently, efficient and scalable algorithms are increasingly important for strain design. Previous algorithms helped to consolidate the utility of computational modeling in this field. To meet intensifying demands for high-performance strains, both the number and variety of genetic manipulations involved in strain construction are increasing. Existing algorithms have experienced combinatorial increases in computational complexity when applied toward the design of such complex strains. Here, we present EMILiO, a new algorithm that increases the scope of strain design to include reactions with individually optimized fluxes. Unlike existing approaches that would experience an explosion in complexity to solve this problem, we efficiently generated numerous alternate strain designs producing succinate, l-glutamate and l-serine. This was enabled by successive linear programming, a technique new to the area of computational strain design. 相似文献

16.

Turbo Tree: a fast algorithm for minimal trees

Penny David; Hendy Michael D. 《Bioinformatics (Oxford, England)》1987,3(3):183-187

A branch and bound algorithm is described for searching rapidlyfor minimal length trees from biological data. The algorithmadds characters one at a time, rather than adding taxa, as inprevious branch and bound methods. The algorithm has been programmedand is available from the authors. A worked example is givenwith 33 characters and 15 taxa. About 8 x 10¹² binary treesare possible with 15 taxa but the branch and bound program findsthe minimal tree in <5 min on an IBM PC. Received on January 15, 1987; accepted on February 23, 1987 相似文献

17.

GlycoPP: a webserver for prediction of N- and O-glycosites in prokaryotic protein sequences

JS Chauhan AH Bhat GP Raghava A Rao 《PloS one》2012,7(7):e40155

Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/www.imtech.res.in/raghava/glycopp/. 相似文献

18.

leBIBIQBPP: a set of databases and a webtool for automatic phylogenetic analysis of prokaryotic sequences

Jean-Pierre Flandrois Guy Perrière Manolo Gouy 《BMC bioinformatics》2015,16(1)

相似文献

19.

A fast word search algorithm for the representation of sequence similarity in genomic DNA.

下载免费PDF全文

C Lefvre J E Ikeda 《Nucleic acids research》1994,22(3):404-411

Representation of sequence similarity by dot matrix plots is a method widely used for comparing biological sequences. The user is presented with an overall view of similarity between two sequences. Computation of this plot has been reconsidered here. An improvement is proposed through the preprocessing of the data into an automation recognizing the word structure of a sequence. The main advantage of this approach is to systematically eliminate the repetitions during word comparison. Simple heuristics are also considered to greatly speed up pattern matching. As a result, large sequences are handled very efficiently. This is illustrated by a comparison of large genomic DNA. The algorithm has been implemented in an interactive application on a microcomputer. 相似文献

20.

NoDe: a fast error-correction algorithm for pyrosequencing amplicon reads

Mohamed Mysara Natalie Leys Jeroen Raes Pieter Monsieurs 《BMC bioinformatics》2015,16(1)

Background

The popularity of new sequencing technologies has led to an explosion of possible applications, including new approaches in biodiversity studies. However each of these sequencing technologies suffers from sequencing errors originating from different factors. For 16S rRNA metagenomics studies, the 454 pyrosequencing technology is one of the most frequently used platforms, but sequencing errors still lead to important data analysis issues (e.g. in clustering in taxonomic units and biodiversity estimation). Moreover, retaining a higher portion of the sequencing data by preserving as much of the read length as possible while maintaining the error rate within an acceptable range, will have important consequences at the level of taxonomic precision.

Results

The new error correction algorithm proposed in this work - NoDe (Noise Detector) - is trained to identify those positions in 454 sequencing reads that are likely to have an error, and subsequently clusters those error-prone reads with correct reads resulting in error-free representative read. A benchmarking study with other denoising algorithms shows that NoDe can detect up to 75% more errors in a large scale mock community dataset, and this with a low computational cost compared to the second best algorithm considered in this study. The positive effect of NoDe in 16S rRNA studies was confirmed by the beneficial effect on the precision of the clustering of pyrosequencing reads in operational taxonomic units.

Conclusions

NoDe was shown to be a computational efficient denoising algorithm for pyrosequencing reads, producing the lowest error rates in an extensive benchmarking study with other denoising algorithms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0520-5) contains supplementary material, which is available to authorized users. 相似文献