首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 423 毫秒
1.
基于PC/Linux的核酸序列分析系统的构建及其应用   总被引:13,自引:2,他引:11  
基于PC机和Linux操作系统, 利用Phred/Phrap/Consed软件和Blast软件, 构建了核酸序列大规模自动分析系统. 该套系统可自动完成从测序峰图向核酸序列的转化、载体序列去除、序列自动拼接、重复序列鉴定以及序列的相似性分析, 可加速对大规模测序数据的分析和利用.  相似文献   

2.
本文报道了在AppleⅡ型微机上实现核酸数据处理的一系列工作程序。应用这些程序,可进行核酸数据的贮存、对指定的核酸数据结构的改造、限制性内切酶识别位点的检索、核酸序列至蛋白序列的翻译、相关核酸序列及蛋白序列的同源性比较、氨基酸密码使用频率的统计和基因的启动子结构的初步探索等方面的工作。  相似文献   

3.
本文分析了等熵方程;推导出核酸序列的熵限方程,率先提出了生物进化过程中核酸序列选择的熵原则;绘制了分析核酸序列熵变的等熵图.  相似文献   

4.
以NCBI维护的一级数据库为数据源建立植物激素相关核酸和蛋白质二级数据库。将该二级数据库设计为基因、蛋白质和文献三部分, 编写软件从上述数据源中采集数据, 并以XML作为中间格式保存, 通过解析提交到二级数据库中并集成部分生物信息学工具软件, 初步实现了数据检索、统计分析、基于Web的本地化BLAST同源序列检索、序列的自动拼接以及蛋白质结构和功能位点的分析等功能。该二级数据库的构建为植物激素作用分子机理研究提供了高针对性的植物激素数据源和生物信息学辅助工具。  相似文献   

5.
以NCBI维护的一级数据库为数据源建立植物激素相关核酸和蛋白质二级数据库。将该二级数据库设计为基因、蛋白质和文献三部分,编写软件从上述数据源中采集数据,并以XML作为中间格式保存,通过解析提交到二级数据库中并集成部分生物信息学工具软件,初步实现了数据检索、统计分析、基于Web的本地化BLAST同源序列检索、序列的自动拼接以及蛋白质结构和功能位点的分析等功能。该二级数据库的构建为植物激素作用分子机理研究提供了高针对性的植物激素数据源和生物信息学辅助工具。  相似文献   

6.
根据物种学名、分类号、任意一段核酸或蛋白质的序列,判定其属于什么物种及其详细分类的信息如何,是生物信息分析的最为基础且重要的环节,但该过程的分析及结果的获取均为手动,费时费力且容易出错。本研究旨在解决如何在NCBI网站上自动或批量获取物种信息。通过解析NCBI在线BLAST结果及其网页源程序特点,利用Perl语言编写自动化脚本,以达到批量获取查询或比对结果的物种分类信息。本研究编写的Perl语言脚本可解决序列在NCBI在线比对后自动或批量获取物种的分类信息问题,适用于细菌、真菌、动物、植物等物种学名、分类号、核酸或蛋白质的任意序列,可以为同行生物数据分析提供参考。  相似文献   

7.
本文报道了能实现对核酸序列资料进行分析处理及其管理的TRS-80(Ⅰ)微型计算机的应用程序。该程序系统是由一组相互独立的、具有多种功能的程序文件通过主程序文件的相互关联而组成,并以文件的形式存贮于软磁盘中。整个管理系统是用磁盘BASICⅡ语言设计编制的。在TRS-80微型机的New DOS操作系统下,通过文件操作和文件存取的方式而实现对核酸序列资料的管理。该管理系统能提供12种功能,并能方便地加以扩充。它基本上能满足用户对于核酸序列一级结构的分析、处理的需要。另外该系统中的大多数程序文件也能用于氨基酸序列的分析处理。  相似文献   

8.
球孢白僵菌是一种广谱性杀虫真菌,为了探索其转录因子BbMSN2识别启动子核心序列的能力,本研究外源表达并纯化了BbMSN2蛋白,合成了3个含有不同数量核心序列(AGGGG/ CCCCT)的核酸探针和6个核心序列点突变的核酸探针,将BbMSN2蛋白和核酸探针体外结合,通过凝胶迁移实验检测核酸探针及结合蛋白的迁移情况。研究发现,目的蛋白与含有核心序列的核酸探针结合时,核酸探针发生了凝胶迁移现象,其中核心序列数量对凝胶迁移的协同效益不显著。但目的蛋白与核心序列点突变核酸探针结合时,凝胶迁移现象明显减弱。上述结果表明,转录因子BbMSN2可以和含有核心序列核酸探针结合并发生相互作用,且对识别序列具有很强的特异性。本研究为深入探索BbMSN2转录调控机制奠定了试验基础。  相似文献   

9.
本文对PCR扩增的668bp的DNA片段进行了亚克隆,然后以Sanger双脱氧中止法为原理,利用美国ABI公司370A自动核酸序列分析仪,确定了668bp的核苷酸序列。序列分析表明鲤鱼生长激素基因的开放读框含有630bp,并推测其中包括22个氨基酸的信号肽和188个氨基酸的成熟多肽。鲤鱼生长激素基因的酶切图谱和序列分析的结果都证明我们已获得了全长的鲤鱼生长激素基因。  相似文献   

10.
把最大信息原理应用到核酸序列的保守位点分析中。利用最大信息原理,推导出了核酸和蛋白质特异性结合时的结合能表达式,并且估计了和蛋白质发生相互作用的核酸序列上的位点范围。为了检验此理论是否较为成功地反映了核酸和蛋白质结合时的实际情况,把它应用到基因内含子剪切位点的识别中,识别结果达到了较高的敏感性和特异性,这说明利用最大信息原理推导结合能表达式及估计核酸序列上参与反应的位点范围的理论是较为成功的。此研究结果一方面有助于核酸和蛋白质相互作用的理解,另一方面,也有助于和蛋白质发生相互作用的各种核酸序列的计算机识别研究。  相似文献   

11.
12.
基于氨基酸组成分布的蛋白质同源寡聚体分类研究   总被引:7,自引:0,他引:7  
基于一种新的特征提取方法——氨基酸组成分布,使用支持向量机作为成员分类器,采用“一对一”的多类分类策略,从蛋白质一级序列对四类同源寡聚体进行分类研究。结果表明,在10-CV检验下,基于氨基酸组成分布,其总分类精度和精度指数分别达到了86.22%和67.12%,比基于氨基酸组成成分的传统特征提取方法分别提高了5.74和10.03个百分点,比二肽组成成分特征提取方法分别提高了3.12和5.63个百分点,说明氨基酸组成分布对于蛋白质同源寡聚体分类是一种非常有效的特征提取方法;将氨基酸组成分布和蛋白质序列长度特征组合,其总分类精度和精度指数分别达到了86.35%和67.23%,说明蛋白质序列长度特征含有一定的空间结构信息。  相似文献   

13.
A computer program has been devised to automate rationalizationof peptide fragmentation patterns. The program systematicallygenerates all possible linear amino acid sequences which mightbe attributable to a peptide with a known amino acid composition.The generated sequences are then searched to find those thatmost closely match the spectrum of an unknown sequence. Received on March 10, 1986; accepted on March 24, 1986  相似文献   

14.
The repeated amino-acid sequences in Citrobacter Freundii beta-lactamase may be indispensable for its function, because such repetitions cannot be simply attributed to a chance. In order to fully explore the functional units in Citrobacter Freundii beta-lactamase, it may need to analyse all the amino acid pairs, triplets, etc. along Citrobacter Freundii beta-lactamase from one terminal to the other terminal, to count their frequencies and calculate their probabilities. The amino-acid sequence of Citrobacter Freundii beta-lactamase was counted according to two-, three- and four-amino-acid sequences. The counted frequency and probability were compared with the predicted frequency and probability. The amino acid sequences, which appear in Citrobacter Freundii beta-lactamase and can be predicted from its amino acid composition according to a purely random mechanism, should not be deliberately evolved and conserved. By contrast, the amino acid sequences, which appear in Citrobacter Freundii beta-lactamase but cannot be predicted from its amino acid composition according to a purely random mechanism, should be deliberately evolved and conversed. Accordingly 99 (26.053%) and 33 (8.684%) of 380 two-amino-acid sequences can be predicted by the frequency and probability according to a purely random mechanism. Some kinds of amino acid sequences, which absent in Citrobacter Freundii beta-lactamase and can be predicted from its amino acid composition according to a purely random mechanism, should not be deliberately excluded from Citrobacter Freundii beta-lactamase. By contrast, some kinds of amino acid sequences, which absent in Citrobacter Freundii beta-lactamase and cannot be predicted from its amino acid composition according to a purely random mechanism, should be deliberately excluded from Citrobacter Freundii beta-lactamase. Accordingly 89 (48.370%) and 41 (22.283%) of 184 kinds of absent two-amino-acid sequences can be predicted by the frequency and probability according to a purely random mechanism, and 7236 (99.848%) of 7247 kinds of absent three-amino-acid sequences can be predicted by the frequency according to a purely random mechanism. The amino acids, whose probabilities in following certain preceding amino acids can be predicted from Citrobacter Freundii beta-lactamase amino acid composition according to a purely random mechanism, should not be deliberately evolved and conversed, accordingly 2 (0.526%) of 380 counted first order Markov transition probabilities for the second amino acid in two-amino-acid sequences match the predicted conditional probabilities.  相似文献   

15.
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf''s law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.  相似文献   

16.
A computational method is presented for locating peptides on known protein sequences using their molecular weights (estimated by SDS polyacrylamide gel electrophoresis) and their composition (obtained by amino acid analysis from eluted gel bands). The technique is easy and rapid, and appears particularly valuable if recoveries of peptides are low, or if they are heterogeneous due to secondary proteolytic degradation. An analysis of limited proteolysis of maltoporin (Schenkman, S. et al. [1984] J. Biol. Chem. 259, 7570-7576) is used to illustrate applicability and reliability of the method, which can also be applied to confirm gene-derived sequences.  相似文献   

17.
We have isolated essentially full-length cDNA clones for human ferritin H and L chains from a human liver cDNA library. This allows the first comparison of H and L nucleotide and amino acid sequences from the same species as well as ferritin L cDNA sequences from different species. We conclude that human H and L ferritins are related proteins which diverged about the time of evolution of birds and mammals. We also deduce the secondary structure of the H and L subunits and compare this with the known structure of horse spleen ferritin. We find that residues involved in subunit interaction in shell assembly are highly conserved in H and L sequences. However, we find several interesting differences in H subunits at the amino acid residues involved in iron transport and deposition. These substitutions could account for known differences in the uptake, storage, and release of iron from isoferritins of different subunit composition.  相似文献   

18.
13 peptic peptides have been isolated from the insoluble (at pH 5.0) fraction of the tryptic hydrolysate of main chromatographic component of otter myoglobin and their amino acid composition and N-terminal amino acid sequences have been determined. The isolated peptides contain in total 40 amino acid residues. The results obtained, along with those on tryptic peptides and the comparison with homologous portions of myoglobins of the known primary structure, allowed reconstructing the complete amino acid sequence of otter myoglobin.  相似文献   

19.
1. Two chymotrypsin variants, with collagenolytic activities, were purified from the hepatopancreas of Penaeus vannamei using radioactive protein as the substrate. 2. These proteases are very close as far as amino acid composition, molecular weight, inhibitors studies and specificity against small synthetic substrates are concerned. 3. N-terminal amino acid sequences of both variants are identical and are very close to other known crustacean serine proteases.  相似文献   

20.
Having obtained the amino acid composition of a protein, chemists and molecular biologists may wish to identify the protein from this data alone. In general such data will have errors associated with them and the length of the protein may be known only approximately or not at all. In this paper a method is described which enables searching of protein sequence databases for sequences or fragments of sequences which have a composition similar to the one being sought. Such searches are generally quite discriminating as shown by the examples provided. This method has been implemented as part of the computer program Scrutineer and is being freely distributed. It is simple to use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号