首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
【背景】随着测序费用的降低,越来越多的科学家选择利用高通量测序技术研究噬菌体的基因组序列。通过对这些基因组数据的分析和研究,一些科学家也开发出了判断dsDNA噬菌体末端序列的方法,但这些方法是基于Linux系统下的命令,并没有在Windows操作系统下的软件。【目的】在Windows平台下开发一款免费的、可以在高通量测序获得的庞大序列文件中找到dsDNA噬菌体基因组末端序列的软件PhageGT。【方法】使用Visual Studio 2019开发一个基于对话框的微软基础类库(Microsoft Foundation Classes,MFC)应用程序。软件使用C++语言开发,逐行读取序列文件中的每条Reads,并设计相应的算法进行统计、计算。【结果】软件PhageGT可在高通量测序文件中提取出不同序列出现的频率、排序,并利用提取序列的最高频率和序列平均频率的比值(R值)判断噬菌体基因组是否存在末端序列。【结论】软件PhageGT的使用比较方便、简单。软件PhageGT和本文所利用的所有测试数据均可从https://zenodo.org/record/4674231#.YHADb-gzZxc免费获得。  相似文献   

2.
GoPipe: 批量序列的Gene Ontology 注释和统计分析   总被引:7,自引:0,他引:7       下载免费PDF全文
随着后基因组时代的到来,批量的测序,特别是 EST 的测序,逐渐成为普通实验室的日常工作 . 这些新的序列往往需要进行批量的 Gene Ontology (GO) 的注释及随后的统计分析 . 但是目前除了 Goblet 以外,并没有软件适合对未知序列进行批量的 GO 注释,而 GoBlet 因为具有上载量的限制,以及仅仅利用 BLAST 作为预测工具,所以仍有许多不足之处 . 开发了一个软件包 GoPipe ,通过整合 BLAST 和 InterProScan 的结果来进行序列注释,并提供了进一步作统计比较的工具 . 主程序接收任意个 BLAST 和 InterProScan 的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作 . 还提供了统计的工具来比较不同输入对 GO 的分布来挖掘生物学意义 . 另外,在交集工作模式下,程序取 InterProScan 和 BLAST 结果的交集, 在测试数据集中,其精确度达到 99.1% ,这大大超过了 InterProScan 本身对 GO 预测的精确度,而敏感度只是稍微下降 . 较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量 Gene Ontology 注释的理想的工具 . 上述软件包可以在网站 (http://gopipe.fishgenome.org/ ) 免费获得或者与作者联系获取 .  相似文献   

3.
序列比对程序Blat在转录组数据分析中的应用   总被引:3,自引:1,他引:2  
随着功能基因组学研究领域的快速发展,人们已经开始系统地研究全基因组的转录以及全部基因发挥功能的动态机制。为实现此目标,需要从海量的转录组数据中提炼出能够揭示基因功能以及表达调控的重要信息。采用高性能的序列比对程序以满足规模化的比对需求是其中的瓶颈环节。通过综合比较目前流行的各种序列比对软件的性能,并针对不同的转录组数据分析任务对Blat进行详细的应用分析,结果发现,Blat能够解决转录组数据分析过程中的序列比对这一瓶颈,可广泛应用于功能基因组相关的数据分析任务。  相似文献   

4.
植物LTR类反转录转座子序列分析识别方法   总被引:2,自引:0,他引:2  
侯小改  张曦  郭大龙 《遗传》2012,(11):1507-1516
LTR类反转录转座子(Long terminal repeat retrotransponson)是真核生物中的一类重要转座元件,具有分布广泛、异质性高等特点,在真核生物基因组进化中起着重要作用,现广泛应用于植物的基因功能分析和遗传多样性研究等方面。LTR类反转录转座子的序列识别是其应用的前提条件,因此对LTR类反转录转座子的序列鉴定和分析方法的研究具有重要的理论意义和实际应用价值。LTR类反转录转座子序列的生物信息学分析软件按原理可大致分为序列比对分析和相关序列保守区域识别鉴定两类。比对软件如BLAST、DNAstar等,是一种序列相似性搜索程序,通过与已知的反转录转座子序列比对后的序列相似性来判断未知序列是否是反转录转座子序列,但这类软件不能直接获得具体的LTR等特征序列的相关信息,不能对反转录转座子序列的全长进行识别。识别鉴定软件按原理可分为从头算起法、比较基因组法、同源搜索法和结构基础法4种,如LTR-Finder等基于从头算起法的识别鉴定软件,可对LTR类反转录转座子全序列进行较准确地预测和注释,RepeatMasker等基于同源搜索法的软件,通过与数据库中的序列的相似性比对后发现可能存在的LTR类反转录转座子。文章对不同的LTR类反转录转座子预测方法进行了比较和分析,在此基础上归纳总结出一套分析LTR类反转录转座子序列的操作流程,旨在为LTR类反转录转座子序列的分析提供参考。  相似文献   

5.
随着核酸和蛋白质序列数据的急剧增加和分子生物学家对最新序列数据的需要,用磁带、磁盘甚至光盘已不能满足大量数据的存贮和数据库迅速更新的要求。另一方面长期维持订购一套(或几套)核酸和蛋白质数据库、购买不断涌现的新的序列分析软件也是一项巨大的开支。近年来,随着全球性信息高速公路的建设。越来越多的分子生物学数据库和软件与国际计算机网络系统相连,任何一台与之连网的计算机都可以利用这些软件和信息资源。用户不但能检索到最新的  相似文献   

6.
《生命科学研究》2014,(5):458-464
高通量测序技术的飞速发展,给生物信息学带来了新的机遇和挑战,第二代测序序列数量多、长度短使得原来的序列分析手段不再适用。近几年来,针对高通量测序的序列分析算法和软件日益增多,目前已有上百种,导致选择合适的软件成为一个难题。对第二代测序的测序类型、序列类型以及分析算法进行了总结和归纳,对现今常用的分析软件的序列的类型、长度以及软件应用算法、输入/输出格式、特点和功能等方面做了详细分析和比较并给出建议。分析了现今测序技术和序列分析存在的问题,预测了今后的发展方向。  相似文献   

7.
Clustal W—蛋白质与核酸序列分析软件   总被引:3,自引:1,他引:2  
蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。  相似文献   

8.
利用Phred/Phrap/Consed、cross.match、RepeatMasker、Blast等软件和自主开发程序,基于Linux操作系统,构建了林木EST序列分析系统,完成了从测序峰图向核酸序列的转化、载体序列的去除、重复序列鉴定、EST序列分类和组装、EST序列功能注释与功能分类以及SSR、SNP的发掘。并通过使用Perl语言结合bioperl模块写的脚本程序使分析过程自动化,从而可以快速地对大批林木EST数据进行分析,为林木的功能基因组学研究提供有用的信息。  相似文献   

9.
笔者在《生命科学》1992年第2期上发表的《生物技术软件介绍之一》介绍了6类9个生物技术数据库和软件,包括细胞学数据库、核酸和蛋白质序列数据库和序列分析软件、PCR引物程序等。近一年多来,我们又引进和开发一批新的生物技术软件,其中有些是我们自行设计,有些购自国外,有些通过与国内外学者进行软  相似文献   

10.
SARS冠状病毒E、M基因序列比较及B细胞抗原表位预测   总被引:2,自引:0,他引:2  
为了比较SARS冠状病毒分离株E、M基因序列及氨基酸序列之间的差异,分析E、M蛋白的可能B细胞抗原表位。利用Lasergene软件包中的Editseq将E、M基因从SARS-CoV全基因序列中截取出,再翻译成氨基酸序列,用Clustal X软件分析它们之间的异同,然后利用Protean软件进行氨基酸序列分析,预测E、M蛋白的B细胞抗原表位。结果证明SAPS-CoV的E、M基因序列相当保守,变异甚少,并分别预测出E、M蛋白有2段和7段可能为B细胞抗原表位。  相似文献   

11.
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding.  相似文献   

12.
基于PC/Linux的核酸序列分析系统的构建及其应用   总被引:13,自引:2,他引:11  
基于PC机和Linux操作系统, 利用Phred/Phrap/Consed软件和Blast软件, 构建了核酸序列大规模自动分析系统. 该套系统可自动完成从测序峰图向核酸序列的转化、载体序列去除、序列自动拼接、重复序列鉴定以及序列的相似性分析, 可加速对大规模测序数据的分析和利用.  相似文献   

13.
Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.  相似文献   

14.
A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. RESULTS: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at http://meme.sdsc.edu.  相似文献   

15.
16.
本文介绍了一个在微机(IBM PC)上实现的、用于核酸顺序分析的计算机程序系统.该系统由三个层次和18个功能块构成,菜单及人机对话使得用户能较快地掌握和使用它.在编程中,采用了树结构、先进后出栈和稀疏矩阵等数据结构技巧,运用了Bayes法等统计分析方法,Kruskal算法和Floyd算法等一系列图论方法也被得到应用,这个软件系统的推出对于分子生物学研究具有一定的积极作用.  相似文献   

17.
DNA chips have proven to be effective tools in detecting gene expression levels. Compared with DNA chips using complementary DNA as probes, oligonucleotide microarrays using oligonucleotides as probes have attracted great attention because of their well known advantages. The design of gene-specific probes for each target is essential to the development of oligonucleotide microarrays. We have previously reported the development of a probe design software termed Mprobe 1.0. Here, we present a new version of this software, termed Mprobe 2.0. Several new features are included in Mprobe 2.0. Firstly, a paradox-based sequence database management system has been developed and integrated into the software, which consequently allows interoperability with sequences in GenBank, EMBL, and FASTA formats. Secondly, in contrast to setting a fixed threshold for the secondary structure of probes in Mprobe 1.0 and other related software, Mprobe 2.0 employs a different method. After parameters such as GC type, probe melting temperature and GC contents have been evaluated, candidate probes are sorted by the free energy from high to low value, followed by specificity analysis. Thirdly, Mprobe 2.0 provides users with substantial parameter options in the visual mode. Mprobe 2.0 possesses an easier interface for users to manage sequences annotated in different formats and design the optimal probes for oligonucleotide microarrays and other applications. AVAILABILITY: The program is free for non-commercial users and can be downloaded from the web page http://www.biosun.org.cn/mprobe/ CONTACT: Wuju Li (wujuli@yahoo.com or liwj@nic.bmi.ac.cn).  相似文献   

18.
We have developed a software package called Osprey for the calculation of optimal oligonucleotides for DNA sequencing and the creation of microarrays based on either PCR-products or directly spotted oligomers. It incorporates a novel use of position-specific scoring matrices, for the sensitive and specific identification of secondary binding sites anywhere in the target sequence. Using accelerated hardware is faster and more efficient than the traditional pairwise alignments used in most oligo-design software. Osprey consists of a module for target site selection based on user input, novel utilities for dealing with problematic sequences such as repeats, and a common code base for the identification of optimal oligonucleotides from the target list. Overall, these improvements provide a program that, without major increases in run time, reflects current DNA thermodynamics models, improves specificity and reduces the user's data preprocessing and parameterization requirements. Using a TimeLogic™ hardware accelerator, we report up to 50-fold reduction in search time versus a linear search strategy. Target sites may be derived from computer analysis of DNA sequence assemblies in the case of sequencing efforts, or genome or EST analysis in the case of microarray development in both prokaryotes and eukaryotes.  相似文献   

19.
The increasing number of whole genomic sequences of microorganisms has led to the complexity of genome-wide annotation and gene sequence comparison among multiple microorganisms. To address this problem, we have developed nWayComp software that compares DNA and protein sequences of phylogenetically-related microorganisms. This package integrates a series of bioinformatics tools such as BLAST, ClustalW, ALIGN, PHYLIP and PRIMER3 for sequence comparison. It searches for homologous sequences among multiple organisms and identifies genes that are unique to a particular organism. The homologous gene sets are then ranked in the descending order of the sequence similarity. For each set of homologous sequences, a table of sequence identity among homologous genes along with sequence variations such as SNPs and INDELS is developed, and a phylogenetic tree is constructed. In addition, a common set of primers that can amplify all the homologous sequences are generated. The nWayComp package provides users with a quick and convenient tool to compare genomic sequences among multiple organisms at the whole-genome level.  相似文献   

20.
The single nucleotide polymorphism (SNP) is the difference of the DNA sequence between individuals and provides abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data sometimes need to be verified. If only a particular gene locus is concerned, locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. The rule is applied to multiply aligned sequences with a trace and the peak height of the traces are compared between samples. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is also designed for a person to verify and edit sequences easily on the screen. It is very useful for identifying de novo SNPs in a DNA fragment of interest.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号