首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 502 毫秒
1.
SOAP: short oligonucleotide alignment program   总被引:22,自引:0,他引:22  
SUMMARY: We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. AVAILABILITY: http://soap.genomics.org.cn.  相似文献   

2.
SUMMARY: MuSeqBox is a program to parse BLAST output and store attributes of BLAST hits in tabular form. The user can apply a number of selection criteria to filter out hits with particular attributes. MuSeqBox provides a powerful annotation tool for large sets of query sequences that are simultaneously compared against a database with any of the standard stand-alone or network-client BLAST programs. We discuss such application to the problem of annotation and analysis of EST collections. AVAILABILITY: The program was written in standard C++ and is freely available to noncommercial users by request from the authors. The program is also available over the web at http://bioinformatics.iastate.edu/bioinformatics2go/mb/MuSeqBox.html.  相似文献   

3.
SUMMARY: Palmitoylation is an important post-translational lipid modification of proteins. Unlike prenylation and myristoylation, palmitoylation is a reversible covalent modification, allowing for dynamic regulation of multiple complex cellular systems. However, in vivo or in vitro identification of palmitoylation sites is usually time-consuming and labor-intensive. So in silico predictions could help to narrow down the possible palmitoylation sites, which can be used to guide further experimental design. Previous studies suggested that there is no unique canonical motif for palmitoylation sites, so we hypothesize that the bona fide pattern might be compromised by heterogeneity of multiple structural determinants with different features. Based on this hypothesis, we partition the known palmitoylation sites into three clusters and score the similarity between the query peptide and the training ones based on BLOSUM62 matrix. We have implemented a computer program for palmitoylation site prediction, Clustering and Scoring Strategy for Palmitoylation Sites Prediction (CSS-Palm) system, and found that the program's prediction performance is encouraging with highly positive Jack-Knife validation results (sensitivity 82.16% and specificity 83.17% for cut-off score 2.6). Our analyses indicate that CSS-Palm could provide a powerful and effective tool to studies of palmitoylation sites. AVAILABILITY: CSS-Palm is implemented in PHP/PERL+MySQL and can be freely accessed at http://bioinformatics.lcd-ustc.org/css_palm/ CONTACT: yaoxb@ustc.edu.cn; xuyn@bmb.uga.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bionformatics online.  相似文献   

4.
Price MN  Dehal PS  Arkin AP 《PloS one》2008,3(10):e3589

Background

All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.

Methodology/Principal Findings

We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database (“NR”), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.

Conclusions/Significance

FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.  相似文献   

5.
BioParser     
The widely used programs BLAST (in this article, 'BLAST' includes both the National Center for Biotechnology Information [NCBI] BLAST and the Washington University version WU BLAST) and FASTA for similarity searches in nucleotide and protein databases usually result in copious output. However, when large query sets are used, human inspection rapidly becomes impractical. BioParser is a Perl program for parsing BLAST and FASTA reports. Making extensive use of the BioPerl toolkit, the program filters, stores and returns components of these reports in either ASCII or HTML format. BioParser is also capable of automatically feeding a local MySQL database with the parsed information, allowing subsequent filtering of hits and/or alignments with specific attributes. For this reason, BioParser is a valuable tool for large-scale similarity analyses by improving the access to the information present in BLAST or FASTA reports, facilitating extraction of useful information of large sets of sequence alignments, and allowing for easy handling and processing of the data. AVAILABILITY: BioParser is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 2.0 license terms (http://creativecommons.org/licenses/by-nc-nd/2.0/) and is available upon request. Additional information can be found at the BioParser website (http://www.dbbm.fiocruz.br/BioParser.html).  相似文献   

6.
SUMMARY: BLAST is a widely used alignment tool for detecting matches between a query sequence and entries in nucleotide sequence databases. Matches (high-scoring pairs, HSPs) are assigned a score based on alignment length and quality and, by default, are reported with the top-scoring matches listed first. For certain types of searches, however, this method of reporting is not optimal. This is particularly true when searching a genome sequence with a query that was derived from the same genome, or a closely related one. If the genome is complex and the assembly is far from complete, correct matches are often relegated to low positions in the results, where they may be easily overlooked. To rectify this problem, we developed TruMatch--a program that parses standard BLAST outputs and identifies HSPs that involve query segments with unique matches to the assembly. Candidates for bona fide matches between a query sequence and a genome assembly are listed at the top of the TruMatch output. AVAILABILITY: TruMatch is written in Perl and is freely available to non-commercial users via web download at the URL: http://genome.kbrin.uky.edu/fungi_tel/TruMatch/  相似文献   

7.
8.
An algorithm for automatic clustering of database protein sequences from Bothrops jararacussu venomous gland, according to sequence similarities of their domains, is described. The program was written in C and Perl languages. This algorithm compares a domain with each ORF protein sequence in the database. Each nucleotide FASTA sequence generates six ORFs. As a result, the user has a list containing all sequences found in a specific domain and a display of the sequence, domain and number of hits. The algorithm lists only the sequences that present a minimum similarity of 30 hits and the best alignment. This limit was considered appropriate. The algorithm is available in the Internet (www.compbionet.org.br/cgi-domains/homesnake) and it can quickly and accurately organizes large database into classes.  相似文献   

9.
DNA chips have proven to be effective tools in detecting gene expression levels. Compared with DNA chips using complementary DNA as probes, oligonucleotide microarrays using oligonucleotides as probes have attracted great attention because of their well known advantages. The design of gene-specific probes for each target is essential to the development of oligonucleotide microarrays. We have previously reported the development of a probe design software termed Mprobe 1.0. Here, we present a new version of this software, termed Mprobe 2.0. Several new features are included in Mprobe 2.0. Firstly, a paradox-based sequence database management system has been developed and integrated into the software, which consequently allows interoperability with sequences in GenBank, EMBL, and FASTA formats. Secondly, in contrast to setting a fixed threshold for the secondary structure of probes in Mprobe 1.0 and other related software, Mprobe 2.0 employs a different method. After parameters such as GC type, probe melting temperature and GC contents have been evaluated, candidate probes are sorted by the free energy from high to low value, followed by specificity analysis. Thirdly, Mprobe 2.0 provides users with substantial parameter options in the visual mode. Mprobe 2.0 possesses an easier interface for users to manage sequences annotated in different formats and design the optimal probes for oligonucleotide microarrays and other applications. AVAILABILITY: The program is free for non-commercial users and can be downloaded from the web page http://www.biosun.org.cn/mprobe/ CONTACT: Wuju Li (wujuli@yahoo.com or liwj@nic.bmi.ac.cn).  相似文献   

10.
TMCompare is an alignment and visualization tool for comparison of sequence information for membrane proteins contained in SWISS-PROT entries, with structural information contained in PDB files. The program can be used for: detection of breaks in alpha helical structure of transmembrane regions; examination of differences in coverage between PDB and SWISS-PROT files; examination of annotation differences between PDB files and associated SWISS-PROT files; examination and comparison of assigned PDB alpha helix regions and assigned SWISS-PROT transmembrane regions in linear sequence (one letter code) format; examination of these differences in 3D using the CHIME plugin, allowing; analysis of the alpha and non-alpha content of transmembrane regions. AVAILABILITY: TMCompare is available for use through selection of a query protein via the internet (http://www.membraneproteins.org/TMCompare) CONTACT: tmcompare@membraneproteins.org  相似文献   

11.
12.
We present a web service allowing to automatically assign sequences to homologous gene families from a set of databases. After identification of the most similar gene family to the query sequence, this sequence is added to the whole alignment and the phylogenetic tree of the family is rebuilt. Thus, the phylogenetic position of the query sequence in its gene family can be easily identified. AVAILABILITY: http://pbil.univ-lyon1.fr/software/HoSeqI/.  相似文献   

13.
The Current Study aimed to investigate the possible role of Heparanase protein (HPSE-1, [Entrez Pubmed ref|NP_001092010.1|, heparanase isoform 1 preproprotein [Homo sapiens]) in evolution by studying the phylogenetic relationship and divergence of HPSE-1 gene using computational methods. The Human HPSE protein sequences from various species were retrieved from GenBank database and were compared using sequence alignment. Multiple sequence alignment was done using Clustal-W with defaults and phylogenetic trees for the gene were built using neighbor-joining method as in BLAST 2.2.26+ version. A total of 112 BLAST hits were found for the heparanase query sequence and these hits showed putative conserved domain, Glyco_hydro_79n superfamily. We then narrowed down the search by manually deleting the proteins which were not HPSE-1. These sequences were then subjected to phylogenetic analyses using the PhyML and TreeDyn software. Our study indicated that HPSE-1 is a conserved protein in classes Mammalia, Aves, Amphibia, Actinopterygii and Insecta emphasizing its importance in the physiology of cell membranes. Occurrence of this gene in evolution with conserved sites strengthens the role of HPSE-1 gene and helps in better understanding the biochemical processes that may lead to cancer.  相似文献   

14.
15.
DoriC: a database of oriC regions in bacterial genomes   总被引:1,自引:0,他引:1  
Replication origins (oriCs) of bacterial genomes currently available in GenBank have been predicted by using a systematic method comprising the Z-curve analysis for nucleotide distribution asymmetry, DnaA box distribution, genes adjacent to candidate oriCs and phylogenetic relationships. These oriCs are organized into a MySQL database, DoriC, which provides extensive information and graphical views of the oriC regions. In addition, users can Blast a query sequence or even a whole genome against DoriC to find a homologous one. DoriC will be updated timely and the latest version is DoriC 1.8, in which oriCs of 425 genomes (468 chromosomes) are identified. AVAILABILITY: DoriC can be accessed from http://tubic.tju.edu.cn/doric/. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://tubic.tju.edu.cn/doric/supplementary.htm.  相似文献   

16.
SUMMARY: TargetFinder is a PC/Windows program for interactive effective antisense oligonucleotide (AO) selection based on mRNA accessible site tagging (MAST) and secondary structures of target mRNA. To make MAST result intuitive, both the alignment result and tag frequency profile is illustrated. As theoretical reference, secondary structure and single strand probability profile of target mRNA is also represented. All of these sequences and profiles are displayed in aligned mode, which facilitates identification of the accessible sites in target mRNA. Graphical, user-friendly interface makes TargetFinder a useful tool in AO target site selection. AVAILABILITY: The software is freely available at http://www.bioit.org.cn/ao/targetfinder.htm CONTACT: sqwang@nic.bmi.ac.cn.  相似文献   

17.
SUMMARY: NdPASA is a web server specifically designed to optimize sequence alignment between distantly related proteins. The program integrates structure information of the template sequence into a global alignment algorithm by employing neighbor-dependent propensities of amino acids as a unique parameter for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. NdPASA is most effective in aligning homologous proteins sharing low percentage of sequence identity. The server is designed to aid homologous protein structure modeling. A PSI-BLAST search engine was implemented to help users identify template candidates that are most appropriate for modeling the query sequences.  相似文献   

18.
SUMMARY: ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are related using BLAST. A redundancy based criterion is applied to each EST to identify reliable unique gene-sequences. Using a highly inbred genome as a source of ESTs eliminates the necessity of computing covariance on each polymorphism to identify alleles of the same gene, thus making this algorithm more streamlined than other alternatives which must computationally attempt to distinguish genes from alleles. AVAILABILITY: The programs were written in PERL and are freely available at http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html CONTACT: nelsonrt@iastate.edu SUPPLEMENTARY INFORMATION: Figures and dataset can be obtained from: http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html.  相似文献   

19.
Wee1-like protein kinase (Wee1) is a tyrosine kinase that regulates the G2 checkpoint and prevents entry into mitosis in response to DNA damage. Based on a series of signaling pathways initiated by Wee1, Wee1 has been recognized as a potential target for cancer therapy. To discover potent Wee1 inhibitors with novel scaffolds, ligand-based pharmacophore model has been built based on 101 known Wee1 inhibitors. Then the best pharmacophore model, AADRRR.340, with good partial least square (PLS) statistics (R2?=?0.9212, Q2?=?0.7457), was selected and validated. The validated model was used as a three-dimensional (3D) search query for databases virtual screening. The filtered molecules were further analyzed and refined by Lipinski’s rule of 5, multiple docking procedures (high throughput virtual screening (HTVS), standard precision (SP), genetic optimization for ligand docking (GOLD), extra precision (XP), and unique quantum polarized ligand docking (QPLD)); absorption, distribution, metabolism, excretion, and toxicity (ADMET) screening; and the Prime/molecular mechanics generalized born surface area (MM-GBSA) method binding free energy calculations. Eight leads were identified as potential Wee1 inhibitors, and a 50?ns molecular dynamics (MD) simulation was carried out for top four inhibitors to predict the stability of ligand–protein complex. Molecular mechanics Poisson–Boltzmann surface area (MM-PBSA) based on MD simulation and the energy contribution per residue to the binding energy were calculated. In the end, three hits with good stabilization and affinity to protein were identified.

Communicated by Ramaswamy H. Sarma  相似文献   


20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号