首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
基因组的开放阅读框(ORF)是基因识别与基因组分析的基础,有多种软件包给出了它们的生成算法,但结果与指标并不统一.本文给出了po-MORF的定义与它的生成算法,证明了由基因组所确定的po-MORF集合的存在与唯一性,并由该生成算法可以得到全部po-MORF序列.我们还比较了若干原核生物基因组中所有CDS与po-MORF序列的相互关系,并讨论了关于基因识别中的有关问题.  相似文献   

2.
本文介绍欧洲分子生物学开放软件包EMBOSS序列分析程序应用实例.第1节简单介绍EMBOSS软件包的概况和基本用法.第2节介绍格式转换、序列提取、序列变换和序列显示等常用序列处理程序.第3节介绍序列比对程序,包括双序列比对、多序列比对和点阵图程序.第4节介绍常用核酸序列分析程序,可用于核苷酸组分统计、开放读码框分析、C...  相似文献   

3.
GoPipe: 批量序列的Gene Ontology 注释和统计分析   总被引:7,自引:0,他引:7       下载免费PDF全文
随着后基因组时代的到来,批量的测序,特别是 EST 的测序,逐渐成为普通实验室的日常工作 . 这些新的序列往往需要进行批量的 Gene Ontology (GO) 的注释及随后的统计分析 . 但是目前除了 Goblet 以外,并没有软件适合对未知序列进行批量的 GO 注释,而 GoBlet 因为具有上载量的限制,以及仅仅利用 BLAST 作为预测工具,所以仍有许多不足之处 . 开发了一个软件包 GoPipe ,通过整合 BLAST 和 InterProScan 的结果来进行序列注释,并提供了进一步作统计比较的工具 . 主程序接收任意个 BLAST 和 InterProScan 的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作 . 还提供了统计的工具来比较不同输入对 GO 的分布来挖掘生物学意义 . 另外,在交集工作模式下,程序取 InterProScan 和 BLAST 结果的交集, 在测试数据集中,其精确度达到 99.1% ,这大大超过了 InterProScan 本身对 GO 预测的精确度,而敏感度只是稍微下降 . 较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量 Gene Ontology 注释的理想的工具 . 上述软件包可以在网站 (http://gopipe.fishgenome.org/ ) 免费获得或者与作者联系获取 .  相似文献   

4.
GENALEX软件是一种在Microsoft Excel程序中运行的跨操作系统平台的居群遗传分析软件包,它可以对共显性数据、单倍体数据和二元数据进行分析。GENALEX还提供了一系列基于频率的分析。例如,GENALEX可进行F统计检验、Nei’s遗传距离和地理距离的同一性检验以及偏性分布的检验。基于距离的计算如AMOVA分析、相关性PCA分析、Mantel检验、居群遗传变异的空间自相关分析和TWOGENER分析也能够在GENALEX中实现。该软件包还提供了20多种不同的图表总结数据已经辅助检测。除此之外,序列信息和基因型数据可以方便地在相关软件中转化格式。最初以辅助教学为目的设计的GENALEX软件现在也为研究人员提供了可以应用的功能。  相似文献   

5.
农业科研试验数据分析系统-LNT   总被引:2,自引:0,他引:2  
分析了农业科研试验统计的基本方法,简述了统计软件包-LNT的程序总控流程及主要功能,阐述了软件包的主要特点,提出了将农业统计分为方差分析、相关通径、回归分析、遗传分析和数据管理五大类24小类的思想,编写了121种统计方法,运行于各种机型的集成软件包-LNT。  相似文献   

6.
中国生物工程学会举办蛋白质与核酸数据格式化处理与序列软件包培训讲习班近几年由于生物工程研究和生物技术生产与计算机高新技术紧密结合致使生物工程产业化飞速向前发展,为了和世界接轨中国生物工程学会于1996年3月25H至3月28日在北京举办一期蛋白质与核酸...  相似文献   

7.
现代罗布人群线粒体DNA D-loop区序列多态性研究   总被引:1,自引:1,他引:0  
目的:研究新疆尉犁县的现代罗布人群线粒体DNAD-loop区序列遗传多态性,并初步探讨现代罗布人群和其他人群的亲缘关系。方法:应用PER扩增直接测序法,对23个所测定的个体序列采用ClustalX、Mega3.1、hrlequin等软件包进行分析。结果:23个个体中,共检测到47个变异位点,界定了22种不同的单倍型,计算出偶合概率P值为0.05482,变异度h值为0.99604。结论:现代罗布人与中亚各民族的亲缘关系很近,尤其是与新疆维吾尔族有很近的亲缘关系。  相似文献   

8.
以从树肝脏mRNA逆转录得到的Ⅰ链cDNA为模板 ,运用SMARTRACEPCR技术 ,扩增得到树载脂蛋白E(apoE)cDNA序列 ,并推导出apoE蛋白质的氨基酸序列 .利用分子生物学软件包PCGENE对氨基酸序列和二级结构进行分析和比较 .结果表明 ,树apoEcDNA序列 (作为新基因已被GenBank接收 ,登录号为AF 30 3830 )由 1138bp构成 ,其中 5′非翻译区 6 4bp ,3′非翻译区 135bp ,939bp组成一个完整开放阅读框架 ,与人apoEcDNA的同源性为 86 % .编码 313个氨基酸组成的apoE前体 ,包含 18个氨基酸构成的信号肽和 2 95个氨基酸组成的成熟蛋白 .与人apoE氨基酸序列的同源性为 78% .树apoE与人及其它种属动物apoE在氨基酸组成上相近 ,但比人apoE少4个氨基酸 ,比动脉粥样硬化易感动物家兔apoE多 2个氨基酸 .经Garnier法预测 ,树apoE蛋白二级结构与人apoE相似 ,螺旋构象 (helical) 6 9 9% ,伸展构象 (extended) 16 6 % ,转角构象 (turn)6 0 % ,无规则卷曲 (coil) 7 6 % .  相似文献   

9.
尽管二代基因组测序技术日渐流行,Sanger测序依旧是SNP识别和分析的金标准。传统对于Sanger测序结果的分析多依赖Seq Man等软件进行。然而这类软件大多依靠人工操作来识别和记录测序结果中的SNP位点,效率低下且容易发生错误。此外,当对多个个体进行序列测定时,这类软件无法完成对群体数据的管理和输出,给研究人员造成了一定的不便。Phred/Phrap/Consed/Polyphred是华盛顿大学开发的基于类Unix平台的软件包,在大规模测序数据的管理和SNP自动识别、标记与输出方面具有强大的功能。然而,由于其安装和使用较为复杂,在国内较少使用。本研究对该软件包的功能、使用流程、特点等进行了介绍,并将其安装于Ubuntu12.04操作系统并置于VMware虚拟机中,方便遗传学者的下载和使用。  相似文献   

10.
近几年来,DNA序列分析技术有了很大发展,可以说,分子生物学突飞猛进的发展与DNA序列分析技术的不断改善是分不开的。虽然Wat(?)on与Crick早在1953年就测出了DNA的双螺旋结构,然而直到六十年代末还没有人能对DNA进行序列分析。值得注意的是,人们在六十年代中期就能够进行RNA序列分析。第一个被测出完整  相似文献   

11.
Investigating metabolic functional capability of a human gut microbiome enables the quantification of microbiome changes, which can cause a phenotypic change of host physiology and disease. One possible way to estimate the functional capability of a microbial community is through inferring metagenomic content from 16S rRNA gene sequences. Genome-scale models (GEMs) can be used as scaffold for functional estimation analysis at a systematic level, however up to date, there is no integrative toolbox based on GEMs for uncovering metabolic functions. Here, we developed the MetGEMs (metagenome-scale models) toolbox, an open-source application for inferring metabolic functions from 16S rRNA gene sequences to facilitate the study of the human gut microbiome by the wider scientific community. The developed toolbox was validated using shotgun metagenomic data and shown to be superior in predicting functional composition in human clinical samples compared to existing state-of-the-art tools. Therefore, the MetGEMs toolbox was subsequently applied for annotating putative enzyme functions and metabolic routes related in human disease using atopic dermatitis as a case study.  相似文献   

12.
The CBCAnalyzer (CBC=compensatory base change) is a custom written software toolbox consisting of three parts, CTTransform, CBCDetect, and CBCTree. CTTransform reads several ct-file formats, and generates a so called "bracket-dot-bracket" format that typically is used as input for other tools such as RNAforester, RNAmovie or MARNA. The latter one creates a multiple alignment based on primary sequences and secondary structures that now can be used as input for CBCDetect. CBCDetect counts CBCs in all against all of the aligned sequences. This is important in detecting species that are discriminated by their sexual incompatibility. The count (distance) matrix obtained by CBCDetect is used as input for CBCTree that reconstructs a phylogram by using the algorithm of BIONJ. In this note we describe the features of the toolbox as well as application examples. The toolbox provides a graphical user interface. It is written in C++ and freely available at: http://cbcanalyzer.bioapps.biozentrum.uni-wuerzburg.de.  相似文献   

13.
SUMMARY: DNAFSMiner (DNA Functional Sites Miner) is a web-based software toolbox to recognize functional sites in nucleic acid sequences. Currently in this toolbox, we provide two software: TIS Miner and Poly(A) Signal Miner. The TIS Miner can be used to predict translation initiation sites in vertebrate DNA/mRNA/cDNA sequences, and the Poly(A) Signal Miner can be used to predict polyadenylation [poly(A)] signals in human DNA sequences. The prediction results are better than those by literature methods on two benchmark applications. This good performance is mainly attributable to our unique learning method. DNAFSMiner is available free of charge for academic and non-profit organizations. AVAILABILITY: http://research.i2r.a-star.edu.sg/DNAFSMiner/ CONTACT: huiqing@i2r.a-star.edu.sg.  相似文献   

14.
15.
spads 1.0 (for ‘Spatial and Population Analysis of DNA Sequences’) is a population genetic toolbox for characterizing genetic variability within and among populations from DNA sequences. In view of the drastic increase in genetic information available through sequencing methods, spads was specifically designed to deal with multilocus data sets of DNA sequences. It computes several summary statistics from populations or groups of populations, performs input file conversions for other population genetic programs and implements locus‐by‐locus and multilocus versions of two clustering algorithms to study the genetic structure of populations. The toolbox also includes two Matlab and r functions, Gdispal and Gdivpal , to display differentiation and diversity patterns across landscapes. These functions aim to generate interpolating surfaces based on multilocus distance and diversity indices. In the case of multiple loci, such surfaces can represent a useful alternative to multiple pie charts maps traditionally used in phylogeography to represent the spatial distribution of genetic diversity. These coloured surfaces can also be used to compare different data sets or different diversity and/or distance measures estimated on the same data set.  相似文献   

16.
In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.  相似文献   

17.
18.
SUMMARY: We introduce a novel Matlab toolbox for microarray data analysis. This toolbox uses normalization based upon a normally distributed background and differential gene expression based on five statistical measures. The objects in this toolbox are open source and can be implemented to suit your application. AVAILABILITY: MDAT v1.0 is a Matlab toolbox and requires Matlab to run. MDAT is freely available at http://microarray.omrf.org/publications/2004/knowlton/MDAT.zip.  相似文献   

19.
SUMMARY: The seq++ package offers a reference set of programs and an extensible library to biologists and developers working on sequence statistics. Its generality arises from the ability to handle sequences described with any alphabet (nucleotides, amino acids, codons and others). seq++ enables sequence modelling with various types of Markov models, including variable length Markov models and the newly developed parsimonious Markov models, all of them potentially phased. Simulation modules are supplied for Monte Carlo methods. Hence, this toolbox allows the study of any biological process which can be described by a series of states taken from a finite set.  相似文献   

20.
BACKGROUND: With the ever-increasing number of sequenced RNAs and the establishment of new RNA databases, such as the Comparative RNA Web Site and Rfam, there is a growing need for accurately and automatically predicting RNA structures from multiple alignments. Since RNA secondary structure is often conserved in evolution, the well known, but underused, mutual information measure for identifying covarying sites in an alignment can be useful for identifying structural elements. This article presents MIfold, a MATLAB toolbox that employs mutual information, or a related covariation measure, to display and predict conserved RNA secondary structure (including pseudoknots) from an alignment. RESULTS: We show that MIfold can be used to predict simple pseudoknots, and that the performance can be adjusted to make it either more sensitive or more selective. We also demonstrate that the overall performance of MIfold improves with the number of aligned sequences for certain types of RNA sequences. In addition, we show that, for these sequences, MIfold is more sensitive but less selective than the related RNAalifold structure prediction program and is comparable with the COVE structure prediction package. CONCLUSION: MIfold provides a useful supplementary tool to programs such as RNA Structure Logo, RNAalifold and COVE, and should be useful for automatically generating structural predictions for databases such as Rfam.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号