首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
一种有效的重复序列识别算法   总被引:1,自引:0,他引:1  
李冬冬  王正志  倪青山 《生物信息学》2005,3(4):163-166,174
重复序列的分析是基因组研究中的一个重要课题,进行这一研究的基础则是从基因组序列中快速有效地找出其中的重复序列。一种投影拼接算法,即利用随机投影获得候选片断集合,利用片断拼接对候选片断进行拼接,以发现基因组中的重复序列。分析了算法的计算复杂度,构造了半仿真测试数据,对算法的测试结果表明了其有效性。  相似文献   

2.

Background  

Development of a fast and accurate scoring function in virtual screening remains a hot issue in current computer-aided drug research. Different scoring functions focus on diverse aspects of ligand binding, and no single scoring can satisfy the peculiarities of each target system. Therefore, the idea of a consensus score strategy was put forward. Integrating several scoring functions, consensus score re-assesses the docked conformations using a primary scoring function. However, it is not really robust and efficient from the perspective of optimization. Furthermore, to date, the majority of available methods are still based on single objective optimization design.  相似文献   

3.
Traditional pharmacotherapies for substance use disorders have focused on mono-substance abuse. However, recent epidemiological studies have found polysubstance use disorders (PUD) are becoming more prevalent and the abuse of adulterated drugs has led to increasing unintentional overdose deaths. Unfortunately, there are no approved pharmacological agents for PUD. Hence, a therapeutic model of interest to address this growing epidemic is immunopharmacotherapy, where individuals are inoculated with conjugate vaccines formulated with haptens that mimic the drug of abuse. These conjugate vaccines have demonstrated significant therapeutic potential against mono-substance abuse, thus recent studies have applied this model to address PUD. This review presents immunopharmacotherapeutic advancements against polysubstance abuse and discusses necessary developments for conjugate vaccines in order to effectively treat this unaddressed epidemic.  相似文献   

4.
5.
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4).  相似文献   

6.
模式发现是生物信息学的一个重要研究方向,但目前的大部分算法还不能保证获得最优的模式.文章推导了针对三个序列片段相似性关系的判据,将其作为剪枝规则,提出并实现了一种深度优先的穷举搜索算法——判据搜索算法(criterion search algorithm,CRISA),理论分析表明,对绝大多数模式发现问题,CRISA具有多项式的计算时间复杂度和线性的空间复杂度。对仿真的和实际的生物序列数据的测试也表明,CRISA能够快速而完全地识别出序列中所有的模式,具有优于其它算法的总体评价,能够应用于实际的模式发现问题。  相似文献   

7.
The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coil genome, with its length more than 5,000 bp, and a mismatch probability less than 4%.  相似文献   

8.
A computer program is described which finds sets of diagnostic characters for the recognition of species. Unlike previous algorithms, it finds all the possible sets requested and will also run with reasonable demands on computer time and storage. The program will search for sets with a specified size range and with a given minimum number of diagnostic characters to distinguish a taxon from all the others.  相似文献   

9.
A stepwise algorithm for finding minimum evolution trees   总被引:7,自引:6,他引:1  
A stepwise algorithm for reconstructing minimum evolution (ME) trees from evolutionary distance data is proposed. In each step, a taxon that potentially has a neighbor (another taxon connected to it with a single interior node) is first chosen and then its true neighbor searched iteratively. For m taxa, at most (m-1)!/2 trees are examined and the tree with the minimum sum of branch lengths (S) is chosen as the final tree. This algorithm provides simple strategies for restricting the tree space searched and allows us to implement efficient ways of dynamically computing the ordinary least squares estimates of S for the topologies examined. Using computer simulation, we found that the efficiency of the ME method in recovering the correct tree is similar to that of the neighbor-joining method (Saitou and Nei 1987). A more exhaustive search is unlikely to improve the efficiency of the ME method in finding the correct tree because the correct tree is almost always included in the tree space searched with this stepwise algorithm. The new algorithm finds trees for which S values may not be significantly different from that of the ME tree if the correct tree contains very small interior branches or if the pairwise distance estimates have large sampling errors. These topologies form a set of plausible alternatives to the ME tree and can be compared with each other using statistical tests based on the minimum evolution principle. The new algorithm makes it possible to use the ME method for large data sets.   相似文献   

10.
MOTIVATION: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. RESULTS: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.  相似文献   

11.
An epidemic caused by COVID-19 in China turned into pandemic within a short duration affecting countries worldwide. Researchers and companies around the world are working on all the possible strategies to develop a curative or preventive strategy for the same, which includes vaccine development, drug repurposing, plasma therapy, and drug discovery based on Artificial intelligence. Therapeutic approaches based on Computational biology and Machine-learning algorithms are specially considered, with a view that these could provide a fast and accurate outcome in the present scenario. As an effort towards developing possible therapeutics for COVID-19, we have used machine-learning algorithms for the generation of alignment kernels from diverse viral sequences of Covid-19 reported from India, China, Italy and USA. Using these diverse sequences we have identified the conserved motifs and subsequently a peptide library was designed against them. Of these, 4 peptides have shown strong binding affinity against the main protease of SARS-CoV-2 (Mpro) and also maintained their stability and specificity under physiological conditions as observed through MD Simulations. Our data suggest that these evolutionary peptides against COVID-19 if found effective may provide cross-protection against diverse Covid-19 variants.  相似文献   

12.

Background  

Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied.  相似文献   

13.
A rapid heuristic algorithm for finding minimum evolution trees   总被引:2,自引:0,他引:2  
The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.  相似文献   

14.
Based on the numerical analysis that covariance exhibits superior statistical precision than cumulant and variance, a new SOFI algorithm by calculating the n orders covariance for each pixel is presented with an almost -fold resolution improvement, which can be enhanced to 2n via deconvolution. An optimized deconvolution is also proposed by calculating the (n + 1) order SD associated with each n order covariance pixel, and introducing the results into the deconvolution as a damping factor to suppress noise generation. Moreover, a re-deconvolution of the covariance image with the covariance-equivalent point spread function is used to further increase the final resolution by above 2-fold. Simulated and experimental results show that this algorithm can significantly increase the temporal–spatial resolution of SOFI, meanwhile, preserve the sample's structure. Thus, a resolution of 58 nm is achieved for 20 experimental images, and the corresponding acquisition time is 0.8 seconds.  相似文献   

15.
We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact repeats) required by the algorithm, and prove a sublinear bound on their expected numbers. Using this result, we present a sub-quadratic algorithm for finding all short (i.e., of length O(log N)) approximate repeats. The running time of our algorithm is O(DN(3pow(epsilon)-1)log N), where epsilon = D/P and pow(epsilon) is an increasing, concave function that is 0 when epsilon = 0 and about 0.9 for DNA and protein sequences.  相似文献   

16.

Background  

Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs.  相似文献   

17.
The problem of identifying significantly differentially expressed genes for replicated microarray experiments is accepted as significant and has been tackled by several researchers. Patterns from Gene Expression (PaGE) and q-values are two of the well-known approaches developed to handle this problem. This paper proposes a powerful approach to handle this problem. We first propose a method for estimating the prior probabilities used in the first version of the PaGE algorithm. This way, the problem definition of PaGE stays intact and we just estimate the needed prior probabilities. Our estimation method is similar to Storey's estimator without being its direct extension. Then, we modify the problem formulation to find significantly differentially expressed genes and present an efficient method for finding them. This formulation increases the power by directly incorporating Storey's estimator. We report the preliminary results on the BRCA data set to demonstrate the applicability and effectiveness of our approach.  相似文献   

18.
Genome editing holds remarkable promise to transform human medicine as new therapies that can directly address the genetic causes of disease. However, concerns remain about possible undesired biological consequences of genome editors, particularly the introduction of unintended ‘off-target’ mutations. Here, we discuss both important considerations for therapeutic genome editing and our understanding of the functional impact of undesired off-target mutations. An important challenge for the future will be the development of new approaches for predicting and defining the probable function of unintended genome-editing mutations, which will inspire confidence in the next generation of promising genome-editing therapies.  相似文献   

19.
20.
MOTIVATION: Sequences for new proteins are being determined at a rapid rate, as a result of the Human Genome Project, and related genome research. The ability to predict the three-dimensional structure of proteins from sequence alone would be useful in discovering and understanding their function. Threading, or fold recognition, aims to predict the tertiary structure of a protein by aligning its amino acid sequence with a large number of structures, and finding the best fit. This approach depends on obtaining good performance from both the scoring function, which simulates the free energy for given trial alignments, and the threading algorithm, which searches for the lowest-score alignment. It appears that current scoring functions and threading algorithms need improvement. RESULTS: This paper presents a new threading algorithm. Numerical tests demonstrate that it is more powerful than two popular approximate algorithms, and much faster than exact methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号