首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
基于小波变换的混合二维ECG数据压缩方法   总被引:5,自引:0,他引:5  
提出了一种新的基于小波变换的混合二维心电(electrocardiogram,ECG)数据压缩方法。基于ECG数据的两种相关性,该方法首先将一维ECG信号转化为二维信号序列。然后对二维序列进行了小波变换,并利用改进的编码方法对变换后的系数进行了压缩编码:即先根据不同系数子带的各自特点和系数子带之间的相似性,改进了等级树集合分裂(setpartitioninghierarchicaltrees,SPIHT)算法和矢量量化(vectorquantization,VQ)算法;再利用改进后的SPIHT与VQ相混合的算法对小波变换后的系数进行了编码。利用所提算法与已有具有代表性的基于小波变换的压缩算法和其他二维ECG信号的压缩算法,对MIT/BIH数据库中的心律不齐数据进行了对比压缩实验。结果表明:所提算法适用于各种波形特征的ECG信号,并且在保证压缩质量的前提下,可以获得较大的压缩比。  相似文献   

2.
众所周知,随着基因组测序工作的蓬勃发展和后基因组时代的到来,生物信息学数据呈指数级增长.生物界在享受着资源共享所带来便利的同时,也随着数据总量和复杂性的不断增加而变得异构化和分布化.目前,各种生物计算软件和数据库资源通常标准不一而且很难兼容.因此,如何在这些异构资源之间实现数据集成与软件共享是有效利用生物信息资源的关键.为解决以上问题,本文提出了一种新型的数据整合架构,该架构通过将web服务与并行计算相结合的方法,轻松地实现了对异地资源数据的访问、提取、转化以及整合.实验证明,本系统在处理异构、海量数据方面有着巨大的计算潜力.  相似文献   

3.
生物多样性本底信息是开展生物多样性评价的基础,现有的生物多样性数据分布比较分散,不同的数据掌握在不同的部门,没有得到很好整合。同时,由于缺乏物种分布的空间信息,生物多样性数据很难满足环境影响评价中生物多样性评价的需求,生物多样性评价一般只限于简单的定性分析。为了促使生物多样性评价工作能更深入的开展,切实有效地保护生物多样性资源,以四川省(重点以甘孜州)为研究案例,整合了现有的生物多样性数据资源,主要包括四川省自然保护区生物多样性数据、甘孜州水生和陆生生物多样性数据、四川省环境敏感区数据以及生物多样性现场调查数据等,建立了四川省生物多样性基础数据库,同时通过数据库开发和网络开发,建立了数据库信息系统,实现了生物多样性空间数据库和属性数据库的查询检索、数据显示等功能。  相似文献   

4.
因特网上的Web服务器是生物学家的主要网络资源[1] ,就目前的数学和计算机科学能力而言 ,不同的基因组序列项目产生的大量序列 ,仍是生物计算领域的主要挑战之一。如何从生物数据的海量中获取蛋白质序列最有价值的信息 ?首要条件是访问由几个生物计算中心如NCBI、EBI、EMBL、SIB和IN FOBIOGEN研究开发的最新序列和结构数据库 (例 :EMBL、GenBank、Swiss Prot、ProteinDataBank)。网络蛋白质序列分析 (NetworkProteinSequenceAnalysis,NPS …  相似文献   

5.
随着“蛋白质组学”的蓬勃发展和人类对生物大分子功能机制的知识积累,涌现出海量的蛋白质相互作用数据。随之,研究者开发了300多个蛋白质相互作用数据库,用于存储、展示和数据的重利用。蛋白质相互作用数据库是系统生物学、分子生物学和临床药物研究的宝贵资源。本文将数据库分为3类:(1)综合蛋白质相互作用数据库;(2)特定物种的蛋白质相互作用数据库;(3)生物学通路数据库。重点介绍常用的蛋白质相互作用数据库,包括BioGRID、STRING、IntAct、MINT、DIP、IMEx、HPRD、Reactome和KEGG等。  相似文献   

6.
随着"蛋白质组学"的蓬勃发展和人类对生物大分子功能机制的知识积累,涌现出海量的蛋白质相互作用数据。随之,研究者开发了300多个蛋白质相互作用数据库,用于存储、展示和数据的重利用。蛋白质相互作用数据库是系统生物学、分子生物学和临床药物研究的宝贵资源。本文将数据库分为3类:(1)综合蛋白质相互作用数据库;(2)特定物种的蛋白质相互作用数据库;(3)生物学通路数据库。重点介绍常用的蛋白质相互作用数据库,包括BioGRID、STRING、IntAct、MINT、DIP、IMEx、HPRD、Reactome和KEGG等。  相似文献   

7.
泛函连接网络计算软件及其在生物多样性研究中的应用   总被引:14,自引:0,他引:14  
针对农田生物多样性分析的需要,研制出泛函连接网络(FLANN)计算软件。该软件由7个Java类和1个HTML文件组成,是一种Internet在线计算工具,可运行于多种操作系统和Web浏览器上,并在各种类型的PC及工作站上使用,可读取多种类型的数据库文件。对水稻田昆虫生物多样性的两组取样调查数据Zmar18和Zapr15,用生物多样性工具软件LUMP和非监督分类-离差平方和聚类法进行统计归纳及分类,分别划分为21个和20个功能群,各包含60个样本。以FLANN计算软件对昆虫生物多样性进行了模式分类分析。结果表明,泛函连接网络的模式分类及预测与实际测查结果吻合良好。泛函连接网络Internet在线计算软件的应用可促进生物多样性数据采集和分析的规范化,有利于数据和信息共享,也为形成高度的生物多样性智能分析系统提供了一种工具。  相似文献   

8.
生物信息学中,发现、鉴别新基因是承上启下的一步,它既承接了过往如“基因组测序”的工作,又是未来“后基因时代”研究的基石.“基因电脑克隆”是利用计算手段发现、鉴别新基因的方法,SiClone软件实现了“基因电脑克隆”功能.本文对SiClone软件操作的数据库提出并行处理方案,并详述了基于MPI(message passing interface)平台实现的并行优化版本PSiClone.根据已得到的EST数据库,展示了软件并行版PSiClone的运行性能,试验数据库EST序列条数仅仅是NCBI(The National Center for Biotechnology Information)dbEST庞大数据库的很小部分,这也暗示我们软件的并行工作对于大数据库的比较和运算将更有应用前景.  相似文献   

9.
随着生物测序技术的快速发展,积累了海量的生物数据。生物数据资源作为生物分析研究及应用的核心和源头,为保证数据的正确性、可用性和安全性,对生物数据资源进行标准化的管理非常重要和迫切。本文综述了目前国内外生物数据标准化研制进展,目前国内外对生物数据缺少一个总体的规划,生物数据语义存在大量的不兼容性,数据格式多种多样,在生物数据收集、处理、存储和共享等方面缺乏统一的标准。国内外生物数据标准化处于起步阶段,但各国生物专家都在努力进行标准研制工作。文章最后从生物数据术语、生物数据资源收集、处理和交换、存储、生物数据库建设和生物数据伦理规范等方面出发,对标准研制工作进行一一探讨,期望能为生物数据标准制定提供一定的参考和依据。  相似文献   

10.
张源笙  夏琳  桑健  李漫  刘琳  李萌伟  牛广艺  曹佳宝  滕徐菲  周晴  章张 《遗传》2018,40(11):1039-1043
生命与健康多组学数据是生命科学研究和生物医学技术发展的重要基础。然而,我国缺乏生物数据管理和共享平台,不但无法满足国内日益增长的生物医学及相关学科领域的研究发展需求,而且严重制约我国生物大数据整合共享与转化利用。鉴于此,中国科学院北京基因组研究所于2016年初成立生命与健康大数据中心(BIG Data Center, BIGD),围绕国家人口健康和重要战略生物资源,建立生物大数据管理平台和多组学数据资源体系。本文重点介绍BIGD的生命与健康大数据资源系统,主要包括组学原始数据归档库、基因组数据库、基因组变异数据库、基因表达数据库、甲基化数据库、生物信息工具库和生命科学维基知识库,提供生物大数据汇交、整合与共享服务,为促进我国生命科学数据管理、推动国家生物信息中心建设奠定重要基础。  相似文献   

11.
In the last decade, the cost of genomic sequencing has been decreasing so much that researchers all over the world accumulate huge amounts of data for present and future use. These genomic data need to be efficiently stored, because storage cost is not decreasing as fast as the cost of sequencing. In order to overcome this problem, the most popular general-purpose compression tool, gzip, is usually used. However, these tools were not specifically designed to compress this kind of data, and often fall short when the intention is to reduce the data size as much as possible. There are several compression algorithms available, even for genomic data, but very few have been designed to deal with Whole Genome Alignments, containing alignments between entire genomes of several species. In this paper, we present a lossless compression tool, MAFCO, specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from 34% to 57%, depending on the data set. When compared to a recent dedicated method, which is not compatible with some data sets, the compression gain of MAFCO is about 9%. Both source-code and binaries for several operating systems are freely available for non-commercial use at: http://bioinformatics.ua.pt/software/mafco.  相似文献   

12.
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that "DNABIT Compress" algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases.  相似文献   

13.
14.
Recent advances in DNA sequencing technologies have enabled the current generation of life science researchers to probe deeper into the genomic blueprint. The amount of data generated by these technologies has been increasing exponentially since the last decade. Storage, archival and dissemination of such huge data sets require efficient solutions, both from the hardware as well as software perspective. The present paper describes BIND-an algorithm specialized for compressing nucleotide sequence data. By adopting a unique 'block-length' encoding for representing binary data (as a key step), BIND achieves significant compression gains as compared to the widely used general purpose compression algorithms (gzip, bzip2 and lzma). Moreover, in contrast to implementations of existing specialized genomic compression approaches, the implementation of BIND is enabled to handle non-ATGC and lowercase characters. This makes BIND a loss-less compression approach that is suitable for practical use. More importantly, validation results of BIND (with real-world data sets) indicate reasonable speeds of compression and decompression that can be achieved with minimal processor/ memory usage. BIND is available for download at http://metagenomics.atc.tcs.com/compression/BIND. No license is required for academic or non-profit use.  相似文献   

15.
With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ~159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.  相似文献   

16.
Sakib MN  Tang J  Zheng WJ  Huang CT 《PloS one》2011,6(12):e28251
Research in bioinformatics primarily involves collection and analysis of a large volume of genomic data. Naturally, it demands efficient storage and transfer of this huge amount of data. In recent years, some research has been done to find efficient compression algorithms to reduce the size of various sequencing data. One way to improve the transmission time of large files is to apply a maximum lossless compression on them. In this paper, we present SAMZIP, a specialized encoding scheme, for sequence alignment data in SAM (Sequence Alignment/Map) format, which improves the compression ratio of existing compression tools available. In order to achieve this, we exploit the prior knowledge of the file format and specifications. Our experimental results show that our encoding scheme improves compression ratio, thereby reducing overall transmission time significantly.  相似文献   

17.

Background

Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges.

Results

This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae, and B. subtilis, and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks.

Conclusions

Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.
  相似文献   

18.

Background

The exponential growth of next generation sequencing (NGS) data has posed big challenges to data storage, management and archive. Data compression is one of the effective solutions, where reference-based compression strategies can typically achieve superior compression ratios compared to the ones not relying on any reference.

Results

This paper presents a lossless light-weight reference-based compression algorithm namely LW-FQZip to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed into three data streams in which the redundancy information are identified and eliminated independently. Particularly, well-designed incremental and run-length-limited encoding schemes are utilized to compress the metadata and quality score streams, respectively. To handle the short reads, LW-FQZip uses a novel light-weight mapping model to fast map them against external reference sequence(s) and produce concise alignment results for storage. The three processed data streams are then packed together with some general purpose compression algorithms like LZMA. LW-FQZip was evaluated on eight real-world NGS data sets and achieved compression ratios in the range of 0.111-0.201. This is comparable or superior to other state-of-the-art lossless NGS data compression algorithms.

Conclusions

LW-FQZip is a program that enables efficient lossless FASTQ data compression. It contributes to the state of art applications for NGS data storage and transmission. LW-FQZip is freely available online at: http://csse.szu.edu.cn/staff/zhuzx/LWFQZip.  相似文献   

19.
A new lossless compression method using context modeling for ultrasound radio-frequency (RF) data is presented. In the proposed compression method, the combination of context modeling and entropy coding is used for effectively lowering the data transfer rates for modern software-based medical ultrasound imaging systems. From the phantom and in vivo data experiments, the proposed lossless compression method provides the average compression ratio of 0.45 compared to the Burg and JPEG-LS methods (0.52 and 0.55, respectively). This result indicates that the proposed compression method is capable of transferring 64-channel 40-MHz ultrasound RF data with a 16-lane PCI-Express 2.0 bus for software beamforming in real time.  相似文献   

20.
Bzw2( basic leucine zipper and W2 domains 2)基因是人类心脏发育候选基因,它由bzip结构域和W2结构域构成.为了研究该基因在心脏发育过程中的作用,利用生物信息学信息克隆了人类Bzw2基因全长,将所得的片段插入到原核表达载体pGEXT-4T-1载体中,利用BL21菌株表达该重组...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号