首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 364 毫秒
1.
利用Phred/Phrap/Consed、cross.match、RepeatMasker、Blast等软件和自主开发程序,基于Linux操作系统,构建了林木EST序列分析系统,完成了从测序峰图向核酸序列的转化、载体序列的去除、重复序列鉴定、EST序列分类和组装、EST序列功能注释与功能分类以及SSR、SNP的发掘。并通过使用Perl语言结合bioperl模块写的脚本程序使分析过程自动化,从而可以快速地对大批林木EST数据进行分析,为林木的功能基因组学研究提供有用的信息。  相似文献   

2.
以牛肝微粒体细胞色素b5(CYB5-BOVIN)为切入点,利用生物学信息学方法获得一系列细胞色素b5家族的成员蛋白,同时对蛋白序列进行多重对齐分析及进化分析,借此为细胞色素b5蛋白的分子设计与构建提供指导意义。  相似文献   

3.
随着越来越多基因组测序的完成,人们可以获得大量的序列信息,如何利用这些信息对未知基因的功能进行预测是一个非常重要的问题.Blast是基本的预测新基因功能的工具,但是仅通过Blast的原始搜索结果,尚无法获得相关基因本体论(gene ontology,GO)注释信息.目前,用户为了获得新基因的GO注释信息,首先需要进行Blast搜索,然后用Blast搜索的结果到GO网站去查询相关的GO注释信息.这浪费了大量的时间,尤其是当Blast的结果数据量很大时.为此,基于GO分类系统,整合BLAST 的结果信息,结合bioperl模块,使用perl语言开发了GoBlast软件.通过GoBlast系统,对于新基因,研究人员只须1次分析运算,就可以同时获得Blast搜索结果和GO注释信息,从而有效地提高了基因功能注释的可信度,加速了功能基因组学的研究.GoBlast为B/S(Browser/Server)架构,用户客户端只要有浏览器程序,就可以通过国际互联网在http://bioq.org/goblast上使用GoBlast系统  相似文献   

4.
基于PC/Linux的核酸序列分析系统的构建及其应用   总被引:13,自引:2,他引:11  
基于PC机和Linux操作系统, 利用Phred/Phrap/Consed软件和Blast软件, 构建了核酸序列大规模自动分析系统. 该套系统可自动完成从测序峰图向核酸序列的转化、载体序列去除、序列自动拼接、重复序列鉴定以及序列的相似性分析, 可加速对大规模测序数据的分析和利用.  相似文献   

5.
首先以马心细胞色素c(Horse Cytc)蛋白的氨基酸序列为查询序列,利用生物信息学方法进行相似性搜索,获得了一系列细胞色素c(Cytc)蛋白的氨基酸序列,然后对Cytc蛋白进行了多重对齐分析、进化分析和三维结构比较分析。分析结果表明:Cytc中某些特定部位的氨基酸残基高度保守;相近物种来源的Cytc具有较近的亲缘关系,而来源于同一物种不同部位的Cytc却具有较远的亲缘关系;来源于不同物种的Cytc,即使具有较远的亲缘关系,却具有极其相似的三维空间结构。这些研究结果将为基于Cytc进行蛋白分子设计与构建提供指导意义。  相似文献   

6.
本文设计了一种用于统计遗传密码子使用频率进行的Foxbase程序,即应用该程序对近年来在我国报道了全序列测定结果的若干重要基因的密码子的使用情况进行了统计分析,效果良好.  相似文献   

7.
用绵羊和牛的Cx44基因(connexin 44 protein gene)序列对NCBI数据库进行Blast检索,得到一个相似性很高的人DNA序列(Human Genome Bank:AL138688),用GENSCAN程序分析AL138688,推测AL138688中包含一个编码区由1个外显子构成的基因——Cx44基因。人Cx44基因的开放阅读框为1320bp,推测编码435个氨基酸。用PROMOTORSCAN程序分析了其启动子。人Cx44基因与绵羊Cx44基因在1320bp有84.75%的一致性,其表达的蛋白有83%的一致性。用Map View将人的Cx44基因定位于13号染色体。  相似文献   

8.
成人视网膜假定蛋白基因ARHP的克隆及生物信息学分析   总被引:4,自引:0,他引:4  
从UniGene库中选取编号为BG2 2 2 62 4来自人鼻咽组织的表达序列标签 (EST )序列 ,联网到NCBI调用Blast服务器分析 ,发现该EST序列是一个代表新基因的未知序列 .利用Blast检索GenBank的nr数据库和EST数据库 ,构建EST重叠群 ,联网到NCBI的ORFfinder服务器 ,分析发现该EST重叠群具有完整的阅读框架 .分别在cDNA序列阅读框架的起始密码子和终止密码子的两侧设计引物 ,以人胎脑cDNA文库为模板 ,进行PCR扩增 ,测序确定该基因的cDNA全长序列 .该基因cDNA序列全长为 1672bp ,阅读框架位于第 3 0 4~ 1557位之间 ,编码由 417个氨基酸组成 ,分子质量为 46 58ku的蛋白质 ,其理论 pI为 4 2 1.将蛋白质序列通过NCBI的Blast服务器进行序列相似性分析 ,发现该基因编码的蛋白质和成年小鼠视网膜未知蛋白 (BAB3 2 2 14 )同源 .经与国际人类基因组命名委员会协商定名为成人视网膜假定蛋白 (adultretinahypotheticalprotein ,ARHP) ,GenBank登录号为AY174896.生物信息学分析表明 ,该蛋白质可能为一参与转录调控的核蛋白 .ARHP基因定位在染色体 5q3 5,跨越 3 5163bp ,含 4个外显子和 3个内含子 .在基因的 5′非翻译区有 2个CpG岛  相似文献   

9.
人类TECTB基因的电子克隆   总被引:13,自引:0,他引:13  
用小鼠和鸡的β-tectorin基因(Tectb)的编码区序列对NCBI数据库进行Blastn比较,得到一个相似性很高的人的gDNA序列(GenBank:AL157786),用GENSCAN、MZEF程序和Blast 2 sequence程序分析AL157786,推测AL157786中包含一个编码区由10个外显子构成的基因-TECTB基因。人TECTB基因的开放阅读框为990bp,推测编码329个氨基酸。人TECTB基因与小鼠的Tectb基因在900bp有88.1%的一致性,在329个氨基酸有94.2%的一致性。用Electronic-PCR将人TECTB基因定位于10q25。  相似文献   

10.
杨子恒 《遗传》1990,12(6):15-18
本文介绍了作者编制的一组用于分析DNA序列资料的计算机程序。程序用BASIC语言写成,在IBM微型机伤调试运行,包括序列打入、核苷酸频率统计、转译及限制酶切点查找等级部分。  相似文献   

11.
序列比对程序Blat在转录组数据分析中的应用   总被引:3,自引:1,他引:2  
随着功能基因组学研究领域的快速发展,人们已经开始系统地研究全基因组的转录以及全部基因发挥功能的动态机制。为实现此目标,需要从海量的转录组数据中提炼出能够揭示基因功能以及表达调控的重要信息。采用高性能的序列比对程序以满足规模化的比对需求是其中的瓶颈环节。通过综合比较目前流行的各种序列比对软件的性能,并针对不同的转录组数据分析任务对Blat进行详细的应用分析,结果发现,Blat能够解决转录组数据分析过程中的序列比对这一瓶颈,可广泛应用于功能基因组相关的数据分析任务。  相似文献   

12.
A workbench for multiple alignment construction and analysis   总被引:126,自引:0,他引:126  
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand.  相似文献   

13.
RNA secondary structure prediction is one of the classic problems of bioinformatics. The most efficient approaches to solving this problem are based on comparative analysis. As a rule, multiple RNA sequence alignment and subsequent determination of a common secondary structure are used. A new algorithm was developed to obviate the need for preliminary multiple sequence alignment. The algorithm is based on a multilevel MEME-like iterative search for a generalized profile. The search for common blocks in RNA sequences is carried out at the first level. Then the algorithm refines the chains consisting of these blocks. Finally, the search for sets of common helices, matched with alignment blocks, is carried out. The algorithm was tested with a tRNA set containing additional junk sequences and with RFN riboswitches. The algorithm is available at http://bioinf.fbb.msu.ru/RNAAlign.  相似文献   

14.
DbClustal addresses the important problem of the automatic multiple alignment of the top scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. Local alignment information is incorporated into a ClustalW global alignment in the form of a list of anchor points between pairs of sequences. The method is demonstrated using anchors supplied by the Blast post-processing program, Ballast. The rapidity and reliability of DbClustal have been demonstrated using the recently annotated Pyrococcus abyssi proteome where the number of alignments with totally misaligned sequences was reduced from 20% to <2%. A web site has been implemented proposing BlastP database searches with automatic alignment of the top hits by DbClustal.  相似文献   

15.
The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for addressing alignment quality issues in real life settings. Here, we present a simple methodology to help identify and quantify the uncertainties in multiple sequence alignments and their effects on subsequent analyses. The proposed methodology is based upon the a priori expectation that sequence alignment results should be independent of the orientation of the input sequences. Thus, for totally unambiguous cases, reversing residue order prior to alignment should yield an exact reversed alignment of that obtained by using the unreversed sequences. Such "ideal" alignments, however, are the exception in real life settings, and the two alignments, which we term the heads and tails alignments, are usually different to a greater or lesser degree. The degree of agreement or discrepancy between these two alignments may be used to assess the reliability of the sequence alignment. Furthermore, any alignment dependent sequence analysis protocol can be carried out separately for each of the two alignments, and the two sets of results may be compared with each other, providing us with valuable information regarding the robustness of the whole analytical process. The heads-or-tails (HoT) methodology can be easily implemented for any choice of alignment method and for any subsequent analytical protocol. We demonstrate the utility of HoT for phylogenetic reconstruction for the case of 130 sequences belonging to the chemoreceptor superfamily in Drosophila melanogaster, and by analysis of the BaliBASE alignment database. Surprisingly, Neighbor-Joining methods of phylogenetic reconstruction turned out to be less affected by alignment errors than maximum likelihood and Bayesian methods.  相似文献   

16.
We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA.  相似文献   

17.
MOTIVATION: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. RESULTS: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. AVAILABILITY: http://genome.life.nctu.edu.tw/MUSICME.  相似文献   

18.

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users.  相似文献   

19.
在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。  相似文献   

20.
多序列比对是生物信息学中基础而又重要的序列分析方法.本文提出一种新的多序列比对算法,该算法综合了渐进比对方法和迭代策略,采用加权函数以调整序列的有偏分布,用neighbor-joining方法构建指导树以确定渐进比对的顺序.通过对BAlibASE中142组蛋白质序列比对的测试,验证了本算法的有效性.与Multalin算法比较的结果表明,本算法能有效地提高分歧较大序列的比对准确率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号