首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 93 毫秒
1.
Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.  相似文献   

2.
3.
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.  相似文献   

4.
The PSI-BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI-BLAST alignments on a stringent database of 123 structurally similar, sequence-dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI-BLAST. Whenever a pair member detects its pair-mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the percentage of these positions out of the structural alignment. Fifty-two sequences detected their pair-mates (for 16 pairs the success was bi-directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5+/-2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9+/-2.5% within the five iterations tested in the current study.  相似文献   

5.
We discuss several aspects related to load balancing of database search jobs in a distributed computing environment, such as Linux cluster. Load balancing is a technique for making the most of multiple computational resources, which is particularly relevant in environments in which the usage of such resources is very high. The particular case of the Sequest program is considered here, but the general methodology should apply to any similar database search program. We show how the runtimes for Sequest searches of tandem mass spectral data can be predicted from profiles of previous representative searches, and how this information can be used for better load balancing of novel data. A well-known heuristic load balancing method is shown to be applicable to this problem, and its performance is analyzed for a variety of search parameters.  相似文献   

6.
MOTIVATION: The deluge of biological information from different genomic initiatives and the rapid advancement in biotechnologies have made bioinformatics tools an integral part of modern biology. Among the widely used sequence alignment tools, BLAST and PSI-BLAST are arguably the most popular. PSI-BLAST, which uses an iterative profile position specific score matrix (PSSM)-based search strategy, is more sensitive than BLAST in detecting weak homologies, thus making it suitable for remote homolog detection. Many refinements have been made to improve PSI-BLAST, and its computational efficiency and high specificity have been much touted. Nevertheless, corruption of its profile via the incorporation of false positive sequences remains a major challenge. RESULTS: We have developed a simple and elegant approach to resolve the problem of model corruption in PSI-BLAST searches. We hypothesized that combining results from the first (least-corrupted) profile with results from later (most sensitive) iterations of PSI-BLAST provides a better discriminator for true and false hits. Accordingly, we have derived a formula that utilizes the E-values from these two PSI-BLAST iterations to obtain a figure of merit for rank-ordering the hits. Our verification results based on a 'gold-standard' test set indicate that this figure of merit does indeed delineate true positives from false positives better than PSI-BLAST E-values. Perhaps what is most notable about this strategy is that it is simple and straightforward to implement.  相似文献   

7.
We have designed hidden Markov models (HMMs) of structurally conserved repeats that, based on pairwise comparisons, are unconserved at the sequence level. To model secondary structure features these HMMs assign higher probabilities of transition to insert or delete states within sequence regions predicted to form loops. HMMs were optimized using a sampling procedure based on the degree of statistical uncertainty associated with parameter estimates. A PSI-BLAST search initialized using a checkpoint-recovered profile derived from simulated sequences emitted by such a HMM can reveal distant structural relationships with, in certain instances, substantially greater sensitivity than a normal PSI-BLAST search. This is illustrated using two examples involving DNA- and RNA-associated proteins with structurally conserved repeats. In the first example a putative sliding DNA clamp protein was detected in the thermophilic bacterium Thermotoga maritima. This protein appears to have arisen by way of a duplicated β-clamp gene that then acquired features of a PCNA-like clamp, perhaps to perform a PCNA-related function in association with one or more of the many archaeal-like proteins present in this organism. In the second example, β-propeller domains were predicted in the large subunit of UV-damaged DNA-binding protein and in related proteins, including the large subunit of cleavage-polyadenylation specificity factor, the yeast Rse1p and human SAP130 pre-mRNA splicing factors and the fission yeast Rik1p gene silencing protein.  相似文献   

8.
To elucidate the role of high mass accuracy in mass spectrometric peptide mapping and database searching, selected proteins were subjected to tryptic digestion and the resulting mixtures were analyzed by electrospray ionization on a 7 Tesla Fourier transform mass spectrometer with a mass accuracy of 1 ppm. Two extreme cases were examined in detail: equine apomyoglobin, which digested easily and gave very few spurious masses, and bovine alpha-lactalbumin, which under the conditions used, gave many spurious masses. The effectiveness of accurate mass measurements in minimizing false protein matches was examined by varying the mass error allowed in the search over a wide range (2-500 ppm). For the "clean" data obtained from apomyoglobin, very few masses were needed to return valid protein matches, and the mass error allowed in the search had little effect up to 500 ppm. However, in the case of alpha-lactalbumin more mass values were needed, and low mass errors increased the search specificity. Mass errors below 30 ppm were particularly useful in eliminating false protein matches when few mass values were used in the search. Collision-induced dissociation of an unassigned peak in the alpha-lactalbumin digest provided sufficient data to unambiguously identify the peak as a fragment from alpha-lactalbumin and eliminate a large number of spurious proteins found in the peptide mass search. The results show that even with a relatively high mass error (0.8 Da for mass differences between singly charged product ions), collision-induced dissociation can help identify proteins in cases where unfavorable digest conditions or modifications render digest peaks unidentifiable by a simple mass mapping search.  相似文献   

9.
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW global alignment in the form of a list of anchor points between pairs of sequences. The method is demonstrated using anchors supplied by the Blast post-processing program, Ballast. The rapidity and reliability of DbClustal have been demonstrated using the recently annotated Pyrococcus abyssi proteome where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A web site has been implemented proposing BlastP database searches with automatic alignment of the top hits by DbClustal.  相似文献   

10.
We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号