首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences.  相似文献   

2.

Background  

Homology is a crucial concept in comparative genomics. The algorithm probably most widely used for homology detection in comparative genomics, is BLAST. Usually a stringent score cutoff is applied to distinguish putative homologs from possible false positive hits. As a consequence, some BLAST hits are discarded that are in fact homologous.  相似文献   

3.

Background  

BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers.  相似文献   

4.
BLAST+: architecture and applications   总被引:5,自引:0,他引:5  

Background  

Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.  相似文献   

5.

Background  

BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming.  相似文献   

6.

Background -  

Sequencing of EST and BAC end datasets is no longer limited to large research groups. Drops in per-base pricing have made high throughput sequencing accessible to individual investigators. However, there are few options available which provide a free and user-friendly solution to the BLAST result storage and data mining needs of biologists.  相似文献   

7.
8.

Background

Large-scale sequence studies requiring BLAST-based analysis produce huge amounts of data to be parsed. BLAST parsers are available, but they are often missing some important features, such as keeping all information from the raw BLAST output, allowing direct access to single results, and performing logical operations over them.

Findings

We implemented BlaSTorage, a Python package that parses multi BLAST results and returns them in a purpose-built object-database format. Unlike other BLAST parsers, BlaSTorage retains and stores all parts of BLAST results, including alignments, without loss of information; a complete API allows access to all the data components.

Conclusions

BlaSTorage shows comparable speed of more basic parser written in compiled languages as C++ and can be easily integrated into web applications or software pipelines.  相似文献   

9.

Background  

Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences.  相似文献   

10.
Motivation

BLAST programs are very efficient in finding similarities for sequences. However for large datasets such as ESTs, manual extraction of the information from the batch BLAST output is needed. This can be time consuming, insufficient, and inaccurate. Therefore implementation of a parser application would be extremely useful in extracting information from BLAST outputs.

Results

We have developed a java application, Batch Blast Extractor, with a user friendly graphical interface to extract information from BLAST output. The application generates a tab delimited text file that can be easily imported into any statistical package such as Excel or SPSS for further analysis. For each BLAST hit, the program obtains and saves the essential features from the BLAST output file that would allow further analysis. The program was written in Java and therefore is OS independent. It works on both Windows and Linux OS with java 1.4 and higher. It is freely available from: http://mcbc.usm.edu/BatchBlastExtractor/

  相似文献   

11.

Background  

The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task.  相似文献   

12.

Background  

The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.  相似文献   

13.

Background  

Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.  相似文献   

14.

Background  

Homology is a key concept in both evolutionary biology and genomics. Detection of homology is crucial in fields like the functional annotation of protein sequences and the identification of taxon specific genes. Basic homology searches are still frequently performed by pairwise search methods such as BLAST. Vast improvements have been made in the identification of homologous proteins by using more advanced methods that use sequence profiles. However additional improvement could be made by exploiting sources of genomic information other than the primary sequence or tertiary structure.  相似文献   

15.

Background  

Classification of bacteria within the genus Brucella has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess Brucella taxonomy. In the current work, we examine 32 sequenced genomes from genus Brucella representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses).  相似文献   

16.

Background

BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i?+?1. Biegert and S?ding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch.

Results

We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI??s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST.

Conclusions

DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the ??Protein BLAST?? link at http://blast.ncbi.nlm.nih.gov.

Reviewers

This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.  相似文献   

17.

Background  

Fast seed-based alignment heuristics such as BLAST and BLAT have become indispensable tools in comparative genomics for all studies aiming at the evolutionary relations of proteins, genes, and non-coding RNAs. This is true in particular for the large mammalian genomes. The sensitivity and specificity of these tools, however, crucially depend on parameters such as seed sizes or maximum expectation values. In settings that require high sensitivity the amount of short local match fragments easily becomes intractable. Then, fragment chaining is a powerful leverage to quickly connect, score, and rank the fragments to improve the specificity.  相似文献   

18.

Aims

Bioflocculant production potential of an actinobacteria isolated from a freshwater environment was evaluated and the bioflocculant characterized.

Methods and Results

16S rDNA nucleotide sequence and BLAST analysis was used to identify the actinobacteria and fermentation conditions, and nutritional requirements were evaluated for optimal bioflocculant production. Chemical analyses, FTIR, 1H NMR spectrometry and SEM imaging of the purified bioflocculant were carried out. The 16S rDNA nucleotide sequences showed 93% similarities to three Cellulomonas species (strain 794, Cellulomonas flavigena DSM 20109 and Cellulomonas flavigena NCIMB 8073), and the sequences was deposited in GenBank as Cellulomonas sp. Okoh (accession number HQ537132 ). Bioflocculant was optimally produced at an initial pH 7, incubation temperature 30°C, agitation speed of 160 rpm and an inoculum size of 2% (vol/vol) of cell density 1·5 × 10cfu ml?1. Glucose (88·09% flocculating activity; yield: 4·04 ± 0·33 g l?1), (NH4)2NO3 (82·74% flocculating activity; yield: 4·47 ± 0·55 g l?1) and MgCl2 (90·40% flocculating activity; yield: 4·41 g l?1) were the preferred nutritional source. Bioflocculant chemical analyses showed carbohydrate, protein and uronic acids in the proportion of 28·9, 19·3 and 18·7% in CPB and 31·4, 18·7 and 32·1% in PPB, respectively. FTIR and 1H NMR indicated the presence of carboxyl, hydroxyl and amino groups amongst others typical of glycosaminoglycan. SEM imaging revealed horizontal pleats of membranous sheets closely packed.

Conclusion

Cellulomonas sp. produces bioflocculant predominantly composed of glycosaminoglycan polysaccharides with high flocculation activity.

Significance and Impact of the Study

High flocculation activity suggests suitability for industrial applications; hence, it may serve to replace the hazardous flocculant used in water treatment.  相似文献   

19.

Background  

DNA methylation plays an important role in development and tumorigenesis by epigenetic modification and silencing of critical genes. The development of PCR-based methylation assays on bisulphite modified DNA heralded a breakthrough in speed and sensitivity for gene methylation analysis. Despite this technological advancement, these approaches require a cumbersome gene by gene primer design and experimental validation. Bisulphite DNA modification results in sequence alterations (all unmethylated cytosines are converted into uracils) and a general sequence complexity reduction as cytosines become underrepresented. Consequently, standard BLAST sequence homology searches cannot be applied to search for specific methylation primers.  相似文献   

20.
Genomic taxonomy of vibrios   总被引:1,自引:0,他引:1  

Background  

Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i. e. data that can be used to distinguish different taxonomic levels, such as species and genera) from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA), supertrees, Average Amino Acid Identity (AAI), genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号