首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
A new method for homology search of DNA sequences is suggested. This method may be used to find extensive and not strong homologies with point mutations and deletions. The running program time for comparing sequences is less then the dynamic program algorithms at least at two orders of magnitude. It makes possible to use the method for homology searching throughover the nucleotide bank by personal computers.  相似文献   

2.
Method enabling fast partial sequencing of cDNA clones   总被引:1,自引:0,他引:1  
Pyrosequencing is a nonelectrophoretic single-tube DNA sequencing method that takes advantage of cooperativity between four enzymes to monitor DNA synthesis. To investigate the feasibility of the recently developed technique for tag sequencing, 64 colonies of a selected cDNA library from human were sequenced by both pyrosequencing and Sanger DNA sequencing. To determine the needed length for finding a unique DNA sequence, 100 sequence tags from human were retrieved from the database and different lengths from each sequence were randomly analyzed. An homology search based on 20 and 30 nucleotides produced 97 and 98% unique hits, respectively. An homology search based on 100 nucleotides could identify all searched genes. Pyrosequencing was employed to produce sequence data for 30 nucleotides. A similar search using BLAST revealed 16 different genes. Forty-six percent of the sequences shared homology with one gene at different positions. Two of the 64 clones had unique sequences. The search results from pyrosequencing were in 100% agreement with conventional DNA sequencing methods. The possibility of using a fully automated pyrosequencer machine for future high-throughput tag sequencing is discussed.  相似文献   

3.
Prediction of protein subcellular localization   总被引:6,自引:0,他引:6  
Yu CS  Chen YC  Lu CH  Hwang JK 《Proteins》2006,64(3):643-651
Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.  相似文献   

4.
Biological sequences are often analyzed by detecting homologous regions between them. Homology search is confounded by simple repeats, which give rise to strong similarities that are not homologies. Standard repeat-masking methods fail to eliminate this problem, and they are especially ill-suited to AT-rich DNA such as malaria and slime-mould genomes. We present a new repeat-masking method, tantan, which is motivated by the mechanisms that create simple repeats. This method thoroughly eliminates spurious homology predictions for DNA–DNA, protein–protein and DNA–protein comparisons. Moreover, it enables accurate homology search for non-coding DNA with extreme A + T composition.  相似文献   

5.
As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.  相似文献   

6.
Frith MC 《PloS one》2011,6(12):e28819
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.  相似文献   

7.
In this paper, we improve the homology search performance by the combination of the predicted protein secondary structures and protein sequences. Previous research suggested that the straightforward combination of predicted secondary structures did not improve the homology search performance, mostly because of the errors in the structure prediction. We solved this problem by taking into account the confidence scores output by the prediction programs.  相似文献   

8.
We develop a procedure called RiPE (Retrieval-induced Phylogeny Environment) that automatically performs an evolutionary analysis of a protein (sub)family, (i) by retrieving the relevant sequences via a homology search, (ii) by using the search report to construct the alignment using only homologous subsequences (taking into account their neighborhood with a low chance of homology), (iii) by realigning, and (iv) by generating phylogenetic trees based on the alignment. In a first implementation of our scheme, we start with the available proteome data of model organisms, perform a PSI-BLAST search, use MView to convert hits into a multiple alignment, and perform realignment and tree building. As a test case, we have investigated the human ABC transporters of the subfamily G, starting with the five known human ABCG transporters. Our method retrieved homologous sequences not previously analyzed, generating a tree that is more plausible and better supported than previously published trees. The RiPE 0.1 prototype is available at the RiPE website, http://ifg-izkf.uni-muenster.de/fuellen/RiPE/ripe.html.  相似文献   

9.
Prediction of membrane segments in sequences of membrane proteins is well known and important problem. Accuracy of the solution of this problem by methods that don't use homology search in additional data bank can be improved. There is a lack of testing data in this area because of small amount of real structures of membrane proteins. In this work, we create a testing set of structural alignments of membrane proteins, in which positioning of the membrane segments reflects agreement of known 3D-structures of proteins in the alignment. We propose a method for predicting position of membrane segments in multiple alignment based on forward-backward algorithm from HMM theory. This method not only allows to predict positions of membrane segments but also forms probability membrane profile, which can be used in multiple alignment methods that take into account secondary structure information about sequences. Method is implemented in computer program available on the World-Wide Web site http://bioinf.fbb.msu.ru/fwdbck/. Proposed method provides results better than MEMSAT method, which is nearly only tool for prediction of membrane segments in multiple alignments without additional homology search.  相似文献   

10.
MOTIVATION: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required. RESULTS: We present a novel method for fast and accurate homology detection, assuming that the Smith-Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available.  相似文献   

11.
Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a ‘color-bar’ format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.  相似文献   

12.
Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo.  相似文献   

13.
唐雯  严明 《微生物学报》2008,48(4):473-479
[目的]里氏木霉是一种重要的产纤维素酶工业用菌种,研究其分泌组特性具有现实意义.[方法]应用生物信息学方法对里氏木霉基因组中9997个开放阅读框(ORF)所编码的氨基酸序列进行了分析,获得了294条可能的分泌蛋白序列,并且按功能对其进行了分类,同时用搜索模体的方法在未知功能的序列中找到具有关键模体的序列,初步确定其潜在的功能.对获得的分泌蛋白的信号肽序列进行了分析.[结果]里氏木霉分泌组中有188种水解酶,包括114种糖苷水解酶、42种蛋白水解酶和11种脂类水解酶等;在糖苷水解酶中包括已报道的22种纤维素酶和15种几丁质酶等,以及30条具有潜在纤维素酶功能的蛋白序列.信号肽序列分析结果表明其同源性较低,而在信号肽酶切位点附近则相对保守.[结论]通过该预测和分析开拓了里氏木霉的研究空间,为今后的研究奠定了理论基础.  相似文献   

14.
利用抗病基因的保守结构设计引物,从抗叶锈病近等基因系材料TcLr24中扩增出一条703bp的条带RGAl,通过与GenBank比对,选取与RGAI高度同源的若干条带,在它们共有的保守序列位置设计引物,利用cDNA末端快速扩增(RACE):ffL术扩增抗病同源基因cDNA全长.扩增到3条全长cDNA,经BLASTp比较,这些序列都舍有NBS保守结构域和多个LRR结构域.与很多已知植物抗病基因的功能相应区域一致.对FRGA-1,、FRGA-2和FRGA-3实时定量PCR分析,表明这3个基因在小麦叶片中都是组成型表达.本研究在小麦材料TcLr24中得到3条抗病基因同源cDNA全长,为研究小麦抗病基因奠定了基础.  相似文献   

15.
Recognition of homologies may give hints about the structure and function of proteins; therefore, we are developing strategies to aid sequence comparisons. Detecting homology of mosaic proteins is especially difficult since the modules constituting these proteins are usually distantly related and their homology is not readily recognized by conventional computer programs. In the present work we show that the rules of the evolution of mosaic proteins can guide the identification of modules of mosaic proteins and can delineate the group of sequences in which the presence of homologous sequences may be expected. By this approach we can concentrate the search for homology to a limited group of sequences; thus ensuring a more intense and more fruitful search. The power of this approach is illustrated by the fact that it could detect homologies not identified by earlier methods of sequence comparison. In this paper we show that thrombomodulin contains a domain homologous with animal lectins, that complement components C9, C8 alpha and C8 beta have modules homologous with one of the repeat units of thrombospondin and that the somatomedin B module of vitronectin is homologous with the internal repeats of plasma cell membrane glycoprotein PC-1.  相似文献   

16.
Expressed sequence tags (ESTs) are partial cDNA sequences read from both ends of random expressed gene fragments used for discovering new genes. DNA libraries from four different developmental stages of Schistosoma mansoni used in this study generated 141 ESTs representing about 2.5% of S. mansoni sequences in dbEST. Sequencing was done by the dideoxy chain termination method. The sequences were submitted to GenBank for homology searching in nonredundant databases using Basic Local Alignment Search Tool for DNA (BLASTN) alignment and for protein (BLASTX) alignment at the National Center for Biotechnology Information (NCBI). Among submitted ESTs, 29 were derived from lambdagt11 sporocyst library, 70 from lambdaZap adult worm library, 31 from lambdaZap cercarial library, and 11 from lambdaZap female B worm library. Homology search revealed that eight (5.6%) ESTs shared homology to previously identified S.mansoni genes in dbEST, 15 (10.6%) are homologous to known genes in other organisms, 116 (81.7%) showed no significant sequence homology in the databases, and the remaining sequences (2.1%) showed low homologies to rRNA or mitochondrial DNA sequences. Thus, among the 141 ESTs studied, 116 sequences are derived from noval, uncharactarized S. mansoni genes. Those 116 ESTs are important for identification of coding regions in the sequences, helping in mapping of schistosome genome, and identifying genes of immunological and pharmacological significance.  相似文献   

17.
18.
Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m?=?105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m?=?106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.  相似文献   

19.
Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different 'enterotypes'. RITA is freely available in Web server and standalone versions.  相似文献   

20.
During the initial phase of RecA-mediated recombination, known as the search for homology, a single-stranded DNA coated by RecA protein and a homologous double-stranded DNA have to perfectly align and pair. We designed a model for the homology search between short molecules, and performed Monte Carlo Metropolis computer simulations of the process. The central features of our model are 1), the assumption that duplex DNA longitudinal thermal fluctuations are instrumental in the binding; and 2), the explicit consideration of the nucleotide sequence. According to our results, recognition undergoes a first slow nucleation step over a few basepairs, followed by a quick extension of the pairing to adjacent bases. The formation of the three-stranded complex tends to be curbed by heterologies but also by another possible obstacle: the presence of partially homologous stretches, such as mono- or polynucleotide repeats. Actually, repeated sequences are observed to trap the molecules in unproductive configurations. We investigate the dependence of the phenomenon on various energy parameters. This mechanism of homology trapping could have a strong biological relevance in the light of the genomic instability experimentally known to be triggered by repeated sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号