首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper outlines a method for gnomonic projection of a molecular surface and a novel application of it to the problem of surface comparison. Semiregular arrays of points are generated by icosahedral tessellation. The surface may be the accessible surface or a chemical parameter surface such as the molecular electrostatic potential. Gnomonic projection retains the 3D characteristics of the inspection surface. Comparison of two surfaces can be achieved by statistical assessment of the pattern match. The method opens the gateway to an optimized search for pattern matches on the surfaces of dissimilar molecular structures.  相似文献   

2.
序列比对是生物信息学中的一项重要任务,通过序列比对可以发现生物序列中的功能、结构和进化的信息。序列比对结果的生物学意义与所选择的匹配、不匹配、插入和删除以及空隙的罚分函数密切相关。现介绍一种参数序列比对方法,该方法把最佳比对作为权值和罚分的函数,可以系统地得到参数的选择对最佳比对结果的影响。然后将其应用于RNA序列比对,分析不同的参数选择对序列比对结果的影响。最后指出参数序列比对算法的应用以及未来的发展方向。  相似文献   

3.
A recent study of partial matches in the Arizona offender database of DNA profiles has revealed a large number of nine and ten locus matches. I use simple models that incorporate the product rule, population substructure, and relatedness to predict the expected number of matches in large databases. I find that there is a relatively narrow window of parameter values that can plausibly describe the Arizona results. Further research could help determine if the Arizona samples are congruent with some of the models presented here or whether fundamental assumptions for predicting these match frequencies requires adjustments.  相似文献   

4.
In this article, we propose a new method for computing rare maximal exact matches between multiple sequences. A rare match between k sequences S(1), ... , S(k) is a string that occurs at most t(i)-times in the sequence S(i), where the t(i) > 0 are user-defined thresholds. First, the suffix tree of one of the sequences (the reference sequence) is built, and then the other sequences are matched separately against this suffix tree. Second, the resulting pairwise exact matches are combined to multiple exact matches. A clever implementation of this method yields a very fast and space efficient program. This program can be applied in several comparative genomics tasks, such as the identification of synteny blocks between whole genomes.  相似文献   

5.
Mastering seeds for genomic size nucleotide BLAST searches   总被引:1,自引:0,他引:1  
One of the most common activities in bioinformatics is the search for similar sequences. These searches are usually carried out with the help of programs from the NCBI BLAST family. As the majority of searches are routinely performed with default parameters, a question that should be addressed is how reliable the results obtained using the default parameter values are, i.e. what fraction of potential matches have been retrieved by these searches. Our primary focus is on the initial hit parameter, also known as the seed or word, used by the NCBI BLASTn, MegaBLAST and other similar programs in searches for similar nucleotide sequences. We show that the use of default values for the initial hit parameter can have a big negative impact on the proportion of potentially similar sequences that are retrieved. We also show how the hit probability of different seeds varies with the minimum length and similarity of sequences desired to be retrieved and describe methods that help in determining appropriate seeds. The experimental results described in this paper illustrate situations in which these methods are most applicable and also show the relationship between the various BLAST parameters.  相似文献   

6.
SUMMARY: BLAST is a widely used alignment tool for detecting matches between a query sequence and entries in nucleotide sequence databases. Matches (high-scoring pairs, HSPs) are assigned a score based on alignment length and quality and, by default, are reported with the top-scoring matches listed first. For certain types of searches, however, this method of reporting is not optimal. This is particularly true when searching a genome sequence with a query that was derived from the same genome, or a closely related one. If the genome is complex and the assembly is far from complete, correct matches are often relegated to low positions in the results, where they may be easily overlooked. To rectify this problem, we developed TruMatch--a program that parses standard BLAST outputs and identifies HSPs that involve query segments with unique matches to the assembly. Candidates for bona fide matches between a query sequence and a genome assembly are listed at the top of the TruMatch output. AVAILABILITY: TruMatch is written in Perl and is freely available to non-commercial users via web download at the URL: http://genome.kbrin.uky.edu/fungi_tel/TruMatch/  相似文献   

7.
MOTIVATION: Studies of efficient and sensitive sequence comparison methods are driven by a need to find homologous regions of weak similarity between large genomes. RESULTS: We describe an improved method for finding similar regions between two sets of DNA sequences. The new method generalizes existing methods by locating word matches between sequences under two or more word models and extending word matches into high-scoring segment pairs (HSPs). The method is implemented as a computer program named DDS2. Experimental results show that DDS2 can find more HSPs by using several word models than by using one word model. AVAILABILITY: The DDS2 program is freely available for academic use in binary code form at http://bioinformatics.iastate.edu/aat/align/align.html and in source code form from the corresponding author.  相似文献   

8.
基于核酸分子杂交的生物技术(如PCR)在病原微生物检测、临床诊断等诸多领域中应用广泛,此类技术的可靠性在于寡核苷酸分子与其靶点结合的高稳定性与特异性,而精确预测寡核苷酸与靶分子结合的二级结构是分析其稳定性与特异性的关键。其中,基于热力学的最近邻模型是寡核苷酸二级结构预测最为可靠的计算方法,但其精确性强烈依赖于精确的热力学参数。由于寡核苷酸分子二级结构的复杂性,除了完美匹配外,还需要错配、内环、膨胀环、末端摇摆、CNG重复、GU摆动等特殊结构的热力学数据。本文综述了近年来用于寡核苷酸二级结构预测的有效热力学数据库及相关计算方法,并指出当前热力学数据库的局限及未来发展方向。  相似文献   

9.
Examination of ball-in-play periods (i.e., match activity cycles) is a method used to provide insight into the physical demands of team sport competition. However, to date, no study has investigated the ball-in-play time of rugby league matches. This study investigated the activity cycles (i.e., ball-in-play periods) of elite National Rugby League (NRL) and National Youth Competition (NYC) matches. Video recordings of 393 NRL matches and 388 NYC matches were coded for activity and recovery cycles. Time when the ball was continuously in play was considered activity, whereas any stoppages during the match (e.g., for scrums, penalties, line drop-outs, tries, and video referee decisions) were considered recovery. The total time the ball was in play for NRL and NYC matches was approximately 55 minutes and 50 minutes, respectively. In comparison with NYC matches, NRL matches had longer average activity cycles (81.2 ± 16.1 vs. 72.0 ± 14.7 seconds, effect size [ES] = 0.60). The average longest activity cycle was also higher (ES = 0.48) in NRL (318.3 ± 65.4 seconds) than in NYC (288.9 ± 57.5 seconds) matches. The longest activity cycle from any match was 667 and 701 seconds for NRL and NYC matches, respectively. The NRL matches had a smaller proportion of short duration (<45 seconds) activity cycles and a greater proportion of longer duration (>91-600 seconds) activity cycles. In conclusion, meaningful differences in activity cycles were observed between NRL and NYC matches, with NRL competition demonstrating longer ball-in-play periods, a smaller proportion of short duration activity cycles, and a larger proportion of longer duration activity cycles. These findings suggest that the ability to perform prolonged high-intensity exercise, coupled with the capacity to recover during brief stoppages in play, is a critical requirement of professional rugby league match play.  相似文献   

10.
In this paper the spacer skeleton concept is used to produce molecular graphs of putative ligands for binding sites. The skeletons are transformed into molecular templates within the constraints of the accessible surface of the ligand-binding site. A distance-matrix method is used to compare ligand points with vertices of the spacer skeleton through a permutation of all possible correspondences. A tolerance parameter is used to screen for poor matches. As a result, a small number of matched vertices and ligand points are produced. These are fitted into the site by a constrained optimization routine using an analytical function. Ligand points fall within the site and are optimally positioned adjacent to the corresponding site points; other vertices of the spacer skeleton lying beneath the accessible surface of the site are clipped off. A molecular template is thereby formed with its vertices linked to the ligand points. The final step is to verify that the bonding integrity of the skeleton remains. The computational methods outlined in this paper have been tested at two binding sites: the pteridine binding site in dihydrofolate reductase and the amidinophenylpyruvate site of trypsin. Molecular graphs for both sites were generated automatically; they showed strong similarity to those of the natural ligands.  相似文献   

11.
An efficient method for matching nucleic acid sequences.   总被引:2,自引:2,他引:0       下载免费PDF全文
A method of computing the fraction of matches between two nucleic acid sequences at all possible alignments is described. It makes use of the Fast Fourier Transform. It should be particularly efficient for very long sequences, achieving its result in a number of operations proportional to n ln n, where n is the length of the longer of the two sequences. Though the objective achieved is of limited interest, this method will complement algorithms for efficiently finding the longest matching parts of two sequences, and is faster than existing algorithms for finding matches allowing deletions and insertions. A variety of economies can be achieved by this Fast Fourier Transform technique in matching multiple sequences, looking for complementarity rather than identity, and matching the same sequences both in forward and reversed orientations.  相似文献   

12.
This paper illustrates a method of searching for matched accessible surfaces on two dissimilar molecules without specifying a particular face to be matched. Both molecules are allowed to rotate and their residuals are minimized to obtain matches. To identify matched orientations, 6-dimensional cluster analysis is used.  相似文献   

13.
The statistical estimates of BLAST and PSI-BLAST are of extreme importance to determine the biological relevance of sequence matches. While being very effective in evaluating most matches, these estimates usually overestimate the significance of matches in the presence of low complexity segments. In this paper, we present a model, based on divergence measures and statistics of the alignment structure, that corrects BLAST e-values for low complexity sequences without filtering or excluding them and generates scores that are more effective in distinguishing true similarities from chance similarities. We evaluate our method and compare it to other known methods using the Gene Ontology (GO) knowledge resource as a benchmark. Various performance measures, including ROC analysis, indicate that the new model improves upon the state of the art. The program is available at biozon.org/ftp/ and www.cs.technion.ac.il/ approximately itaish/lowcomp/.  相似文献   

14.
15.
基于DNA序列K-tuple分布的一种非序列比对分析   总被引:1,自引:0,他引:1  
沈娟  吴文武  解小莉  郭满才  袁志发 《遗传》2010,32(6):606-612
文章在基因组K-tuple分布的基础上, 给出了一种推测生物序列差异大小的非序列比对方法。该方法可用于衡量真实DNA序列和随机重排序列在K-tuple分布上的差异。将此方法用于构建含有26种胎盘哺乳动物线粒体全基因组的系统树时, 随着K的增大, 系统树的分类效果与生物学一致公认的结果愈加匹配。结果表明, 用此方法构建的系统进化树比用其他非序列比对分析方法构建的更加合理。  相似文献   

16.
DNA and protein sequence comparisons are performed by a number of computational algorithms. Most of these algorithms search for the alignment of two sequences that optimizes some alignment score. It is an important problem to assess the statistical significance of a given score. In this paper we use newly developed methods for Poisson approximation to derive estimates of the statistical significance ofk-word matches on a diagonal of a sequence comparison. We require at leastq of thek letters of the words to match where 0<qk. The distribution of the number of matches on a diagonal is approximated as well as the distribution of the order statistics of the sizes of clumps of matches on the diagonal. These methods provide an easily computed approximation of the distribution of the longest exact matching word between sequences. The methods are validated using comparisons of vertebrate andE. coli protein sequences. In addition, we compare two HLA class II transplantation antigens by this method and contrast the results with a dynamic programming approach. Several open problems are outlined in the last section. This work was supported by grants DMS 90-05833 from NSF and GM 36230 from NIH.  相似文献   

17.
Word matches are widely used to compare genomic sequences. Complete genome alignment methods often rely on the use of matches as anchors for building their alignments, and various alignment-free approaches that characterize similarities between large sequences are based on word matches. Among matches that are retrieved from the comparison of two genomic sequences, a part of them may correspond to spurious matches (SMs), which are matches obtained by chance rather than by homologous relationships. The number of SMs depends on the minimal match length (?) that has to be set in the algorithm used to retrieve them. Indeed, if ? is too small, a lot of matches are recovered but most of them are SMs. Conversely, if ? is too large, fewer matches are retrieved but many smaller significant matches are certainly ignored. To date, the choice of ? mostly depends on empirical threshold values rather than robust statistical methods. To overcome this problem, we propose a statistical approach based on the use of a mixture model of geometric distributions to characterize the distribution of the length of matches obtained from the comparison of two genomic sequences.  相似文献   

18.
Abstract. In the current study we present a Gompertzian model for cell growth as a function of cell phenotype using six human tumour cell lines (A-549, NCI-H596, NCI-H520, HT-29, SW-620 and U-251). Monolayer cells in exponential growth at various densities were quantified over a week by sulforhodamine B staining assay to produce cell-growth curves. A Gompertz equation was fitted to experimental data to obtain, for each cell line, three empirical growth parameters (initial cell density, cell-growth rate and carrying capacity – the maximal cell density). A cell-shape parameter named deformation coefficient D (a morphological relationship among spreading and confluent cells) was established and compared by regression analysis with the relative growth rate parameter K described by the Gompertz equation. We have found that coefficient D is directly proportional to the growth parameter K . The fit curve significantly matches the empirical data ( P  < 0.05), with a correlation coefficient of 0.9152. Therefore, a transformed Gompertzian growth function was obtained accordingly to D . The degree of correlation between the Gompertzian growth parameter and the coefficient D allows a new interpretation of the growth parameter K on the basis of morphological measurements of a set of tumour cell types, supporting the idea that cell-growth kinetics can be modulated by phenotypic organization of attached cells.  相似文献   

19.
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matches. Many of these algorithms, such as PeptideProphet, IDPicker, or Q-ranker, follow a similar methodology that includes representing peptide-spectrum matches as feature vectors and using optimization techniques to rank them. We propose a richer and more flexible feature set representation that is based on the parametrization of the SEQUEST XCorr score and that can be used by all of these algorithms. This extended feature set allows a more effective ranking of the peptide-spectrum matches based on the target-decoy strategy, in comparison to a baseline feature set devoid of these XCorr-based features. Ranking using the extended feature set gives 10-40% improvement in the number of distinct peptide identifications relative to a range of q-value thresholds. While this work is inspired by the model of the theoretical spectrum and the similarity measure between spectra used specifically by SEQUEST, the method itself can be applied to the output of any database search. Further, our approach can be trivially extended beyond XCorr to any linear operator that can serve as similarity score between experimental spectra and peptide sequences.  相似文献   

20.
The voltage and time dependence of ion channels can be regulated, notably by phosphorylation, interaction with phospholipids, and binding to auxiliary subunits. Many parameter variation studies have set conductance densities free while leaving kinetic channel properties fixed as the experimental constraints on the latter are usually better than on the former. Because individual cells can tightly regulate their ion channel properties, we suggest that kinetic parameters may be profitably set free during model optimization in order to both improve matches to data and refine kinetic parameters. To this end, we analyzed the parameter optimization of reduced models of three electrophysiologically characterized and morphologically reconstructed globus pallidus neurons. We performed two automated searches with different types of free parameters. First, conductance density parameters were set free. Even the best resulting models exhibited unavoidable problems which were due to limitations in our channel kinetics. We next set channel kinetics free for the optimized density matches and obtained significantly improved model performance. Some kinetic parameters consistently shifted to similar new values in multiple runs across three models, suggesting the possibility for tailored improvements to channel models. These results suggest that optimized channel kinetics can improve model matches to experimental voltage traces, particularly for channels characterized under different experimental conditions than recorded data to be matched by a model. The resulting shifts in channel kinetics from the original template provide valuable guidance for future experimental efforts to determine the detailed kinetics of channel isoforms and possible modulated states in particular types of neurons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号