首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ObjectivesBecause of the large amount of medical imaging data, the transmission process becomes complicated in telemedicine applications. Thus, in order to adapt the data bit streams to the constraints related to the limitation of the bandwidths a reduction of the size of the data by compression of the images is essential. Despite the improvements in the field of compression, the transmission itself can also introduce errors. For this reason, it is important to develop an adequate strategy which will help reduce this volume of data without having to introduce some distortion and resist the errors introduced by the channel noise during transmission. Thus, in this paper, we propose a ROI-based coding strategy and unequal bit stream protection to meet this dual constraint.Material and methodsThe proposed ROI-based compression strategy with unequal bit stream protection is composed of three parts: the first one allows the extraction of the ROI region, the second one consists of a ROI-based coding and the third one allows an unequal protection of the ROI bit stream.First, the Regions Of Interest (ROI) are extracted by hierarchical segmentation of these regions according to a segmentation method based on the technique of Marker-based-watershed combined with the technique of active contours by level set. The resulting regions are selectively encoded by a 3D coder based on a shape adaptive discrete wavelet transform 3D-BISK, where the compression ratio of each region depends on its relevance in diagnosis. These obtained regions of interest are protected with an error-correcting code of Reed-Solomon type with a code rate that varies according to the relevance of the region by an unequal protection strategy (UEP).ResultsThe performance of the proposed compression scheme is evaluated in several ways. First, tests are performed to study the impact of errors on the different bit streams. In the first place, these tests are carried out in order to study the effect of the variation of the compression rates on the different bit streams. Secondly, different Reed Solomon error-correcting codes of different code rates are tested at different compression rates on a BSC channel. Finally, the performances of this coding strategy are compared with those of SPIHT 3D in the case of transmission on a BSC channel.ConclusionThe obtained results show that the proposed method is quite efficient in transmission time reduction. Therefore, our proposed scheme will reduce the volume of data without having to introduce some distortion and resist the errors introduced by the channel noise in the case of telemedicine.  相似文献   

2.
Concurrent coding is an encoding scheme with ‘holographic’ type properties that are shown here to be robust against a significant amount of noise and signal loss. This single encoding scheme is able to correct for random errors and burst errors simultaneously, but does not rely on cyclic codes. A simple and practical scheme has been tested that displays perfect decoding when the signal to noise ratio is of order -18dB. The same scheme also displays perfect reconstruction when a contiguous block of 40% of the transmission is missing. In addition this scheme is 50% more efficient in terms of transmitted power requirements than equivalent cyclic codes. A simple model is presented that describes the process of decoding and can determine the computational load that would be expected, as well as describing the critical levels of noise and missing data at which false messages begin to be generated.  相似文献   

3.
Software based efficient and reliable ECG data compression and transmission scheme is proposed here. The algorithm has been applied to various ECG data of all the 12 leads taken from PTB diagnostic ECG database (PTB-DB). First of all, R-peaks are detected by differentiation and squaring technique and QRS regions are located. To achieve a strict lossless compression in the QRS regions and a tolerable lossy compression in rest of the signal, two different compression algorithms have used. The whole compression scheme is such that the compressed file contains only ASCII characters. These characters are transmitted using internet based Short Message Service (SMS) and at the receiving end, original ECG signal is brought back using just the reverse logic of compression. It is observed that the proposed algorithm can reduce the file size significantly (compression ratio: 22.47) preserving ECG signal morphology.  相似文献   

4.
SUMMARY: The amount of genomic sequence data being generated and made available through public databases continues to increase at an ever-expanding rate. Downloading, copying, sharing and manipulating these large datasets are becoming difficult and time consuming for researchers. We need to consider using advanced compression techniques as part of a standard data format for genomic data. The inherent structure of genome data allows for more efficient lossless compression than can be obtained through the use of generic compression programs. We apply a series of techniques to James Watson's genome that in combination reduce it to a mere 4MB, small enough to be sent as an email attachment.  相似文献   

5.
A pedigree is a diagram of family relationships, and it is often used to determine the mode of inheritance (dominant, recessive, etc.) of genetic diseases. Along with rapidly growing knowledge of genetics and accumulation of genealogy information, pedigree data is becoming increasingly important. In large pedigree graphs, path-based methods for efficiently computing genealogical measurements, such as inbreeding and kinship coefficients of individuals, depend on efficient identification and processing of paths. In this paper, we propose a new compact path encoding scheme on large pedigrees, accompanied by an efficient algorithm for identifying paths. We demonstrate the utilization of our proposed method by applying it to the inbreeding coefficient computation. We present time and space complexity analysis, and also manifest the efficiency of our method for evaluating inbreeding coefficients as compared to previous methods by experimental results using pedigree graphs with real and synthetic data. Both theoretical and experimental results demonstrate that our method is more scalable and efficient than previous methods in terms of time and space requirements.  相似文献   

6.
海量生物信息数据的不断涌现迫切需要在数据压缩技术方面进行更多研究,以减轻服务器存储压力和提高网络传输及数据分析的效率。目前虽然已开发出大量数据压缩软件,但对于海量生物信息数据而言,应该选用何种软件和方法进行数据压缩,尚缺乏详细的综合比较分析。本文选择生物信息学领域中GenBank数据库中的典型核酸和蛋白质序列数据库以及典型生物信息软件Blast和EMBOSS为例,采用不同数据压缩软件进行综合比较分析,结果发现经典压缩软件compress的总体压缩效率很高,除压缩比率可接受之外,其压缩时间相对其他软件而言显著减少,甚至比并行化的hzip2(pbzip2)和gzip(pigz)软件的时间还少很多,故可优先考虑使用。7-Zip软件虽然具有最高的压缩比率,但压缩过程十分耗时,可用于数据的长期储存;而采用bzip2、rar以及gzip等软件压缩的文件,虽然压缩比率较7-Zip的偏低,但压缩过程相对而言还比较快速。具体应用中推荐使用经典压缩软件compress以及并行化运行的pbzip2和pigz软件,三者可作为同时兼顾压缩比率和压缩时间的优选。  相似文献   

7.
Next-generation sequencing (NGS) technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5–35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern.  相似文献   

8.
A major challenge of current high-throughput sequencing experiments is not only the generation of the sequencing data itself but also their processing, storage and transmission. The enormous size of these data motivates the development of data compression algorithms usable for the implementation of the various storage policies that are applied to the produced intermediate and final result files. In this article, we present NGC, a tool for the compression of mapped short read data stored in the wide-spread SAM format. NGC enables lossless and lossy compression and introduces the following two novel ideas: first, we present a way to reduce the number of required code words by exploiting common features of reads mapped to the same genomic positions; second, we present a highly configurable way for the quantization of per-base quality values, which takes their influence on downstream analyses into account. NGC, evaluated with several real-world data sets, saves 33–66% of disc space using lossless and up to 98% disc space using lossy compression. By applying two popular variant and genotype prediction tools to the decompressed data, we could show that the lossy compression modes preserve >99% of all called variants while outperforming comparable methods in some configurations.  相似文献   

9.
C.K. Jha  M.H. Kolekar 《IRBM》2021,42(1):65-72
ObjectiveIn health-care systems, compression is an essential tool to solve the storage and transmission problems. In this regard, this paper reports a new electrocardiogram (ECG) data compression scheme which employs sifting function based empirical mode decomposition (EMD) and discrete wavelet transform.MethodEMD based on sifting function is utilized to get the first intrinsic mode function (IMF). After EMD, the first IMF and four significant sifting functions are combined together. This combination is free from many irrelevant components of the signal. Discrete wavelet transform (DWT) with mother wavelet ‘bior4.4’ is applied to this combination. The transform coefficients obtained after DWT are passed through dead-zone quantization. It discards small transform coefficients lying around zero. Further, integer conversion of coefficients and run-length encoding are utilized to achieve a compressed form of ECG data.ResultsCompression performance of the proposed scheme is evaluated using 48 ECG records of the MIT-BIH arrhythmia database. In the comparison of compression results, it is observed that the proposed method exhibits better performance than many recent ECG compressors. A mean opinion score test is also conducted to evaluate the true quality of the reconstructed ECG signals.ConclusionThe proposed scheme offers better compression performance with preserving the key features of the signal very well.  相似文献   

10.
As part of the EUROCarbDB project (www.eurocarbdb.org) we have carefully analyzed the encoding capabilities of all existing carbohydrate sequence formats and the content of publically available structure databases. We have found that none of the existing structural encoding schemata are capable of coping with the full complexity to be expected for experimentally derived structural carbohydrate sequence data across all taxonomic sources. This gap motivated us to define an encoding scheme for complex carbohydrates, named GlycoCT, to overcome the current limitations. This new format is based on a connection table approach, instead of a linear encoding scheme, to describe the carbohydrate sequences, with a controlled vocabulary to name monosaccharides, adopting IUPAC rules to generate a consistent, machine-readable nomenclature. The format uses a block concept to describe frequently occurring special features of carbohydrate sequences like repeating units. It exists in two variants, a condensed form and a more verbose XML syntax. Sorting rules assure the uniqueness of the condensed form, thus making it suitable as a direct primary key for database applications, which rely on unique identifiers. GlycoCT encompasses the capabilities of the heterogeneous landscape of digital encoding schemata in glycomics and is thus a step forward on the way to a unified and broadly accepted sequence format in glycobioinformatics.  相似文献   

11.
《IRBM》2022,43(5):325-332
ObjectiveIn cardiac patient-care, compression of long-term ECG data is essential to minimize the data storage requirement and transmission cost. Hence, this paper presents a novel electrocardiogram data compression technique which utilizes modified run-length encoding of wavelet coefficients.MethodFirst, wavelet transform is applied to the ECG data which decomposes it and packs maximum energy to less number of transform coefficients. The wavelet transform coefficients are quantized using dead-zone quantization. It discards small valued coefficients lying in the dead-zone interval while other coefficients are kept at the formulated quantized output interval. Among all the quantized coefficients, an average value is assigned to those coefficients for which energy packing efficiency is less than 99.99%. The obtained coefficients are encoded using modified run-length coding. It offers higher compression ratio than conventional run-length coding without any loss of information.ResultsCompression performance of the proposed technique is evaluated using different ECG records taken from the MIT-BIH arrhythmia database. The average compression performance in terms of compression ratio, percent root mean square difference, normalized percent mean square difference, and signal to noise ratio are 17.18, 3.92, 6.36, and 28.27 dB respectively for 48 ECG records.ConclusionThe compression results obtained by the proposed technique is better than techniques recently introduced by others. The proposed technique can be utilized for compression of ECG records of Holter monitoring.  相似文献   

12.
In recent years, the intrinsic low rank structure of some datasets has been extensively exploited to reduce dimensionality, remove noise and complete the missing entries. As a well-known technique for dimensionality reduction and data compression, Generalized Low Rank Approximations of Matrices (GLRAM) claims its superiority on computation time and compression ratio over the SVD. However, GLRAM is very sensitive to sparse large noise or outliers and its robust version does not have been explored or solved yet. To address this problem, this paper proposes a robust method for GLRAM, named Robust GLRAM (RGLRAM). We first formulate RGLRAM as an l 1-norm optimization problem which minimizes the l 1-norm of the approximation errors. Secondly, we apply the technique of Augmented Lagrange Multipliers (ALM) to solve this l 1-norm minimization problem and derive a corresponding iterative scheme. Then the weak convergence of the proposed algorithm is discussed under mild conditions. Next, we investigate a special case of RGLRAM and extend RGLRAM to a general tensor case. Finally, the extensive experiments on synthetic data show that it is possible for RGLRAM to exactly recover both the low rank and the sparse components while it may be difficult for previous state-of-the-art algorithms. We also discuss three issues on RGLRAM: the sensitivity to initialization, the generalization ability and the relationship between the running time and the size/number of matrices. Moreover, the experimental results on images of faces with large corruptions illustrate that RGLRAM obtains the best denoising and compression performance than other methods.  相似文献   

13.
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that "DNABIT Compress" algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases.  相似文献   

14.
The accurate identification of the route of transmission taken by an infectious agent through a host population is critical to understanding its epidemiology and informing measures for its control. However, reconstruction of transmission routes during an epidemic is often an underdetermined problem: data about the location and timings of infections can be incomplete, inaccurate, and compatible with a large number of different transmission scenarios. For fast-evolving pathogens like RNA viruses, inference can be strengthened by using genetic data, nowadays easily and affordably generated. However, significant statistical challenges remain to be overcome in the full integration of these different data types if transmission trees are to be reliably estimated. We present here a framework leading to a bayesian inference scheme that combines genetic and epidemiological data, able to reconstruct most likely transmission patterns and infection dates. After testing our approach with simulated data, we apply the method to two UK epidemics of Foot-and-Mouth Disease Virus (FMDV): the 2007 outbreak, and a subset of the large 2001 epidemic. In the first case, we are able to confirm the role of a specific premise as the link between the two phases of the epidemics, while transmissions more densely clustered in space and time remain harder to resolve. When we consider data collected from the 2001 epidemic during a time of national emergency, our inference scheme robustly infers transmission chains, and uncovers the presence of undetected premises, thus providing a useful tool for epidemiological studies in real time. The generation of genetic data is becoming routine in epidemiological investigations, but the development of analytical tools maximizing the value of these data remains a priority. Our method, while applied here in the context of FMDV, is general and with slight modification can be used in any situation where both spatiotemporal and genetic data are available.  相似文献   

15.
We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of information transmission into each offspring. Noisy transmission channels (replication with mutations) give rise to a third pressure that acts on the actual encoding of information; it maximizes the fraction of mutations that are neutral with respect to the phenotype. This neutrality selection has important implications for the evolution of evolvability. We demonstrate each selective pressure in experiments with digital organisms.  相似文献   

16.
Transmission of long duration EEG signals without loss of information is essential for telemedicine based applications. In this work, a lossless compression scheme for EEG signals based on neural network predictors using the concept of correlation dimension (CD) is proposed. EEG signals which are considered as irregular time series of chaotic processes can be characterized by the non-linear dynamic parameter CD which is a measure of the correlation among the EEG samples. The EEG samples are first divided into segments of 1 s duration and for each segment, the value of CD is calculated. Blocks of EEG samples are then constructed such that each block contains segments with closer CD values. By arranging the EEG samples in this fashion, the accuracy of the predictor is improved as it makes use of highly correlated samples. As a result, the magnitude of the prediction error decreases leading to less number of bits for transmission. Experiments are conducted using EEG signals recorded under different physiological conditions. Different neural network predictors as well as classical predictors are considered. Experimental results show that the proposed CD based preprocessing scheme improves the compression performance of the predictors significantly.  相似文献   

17.
18.
Pavoni E  Monteriù G  Cianfriglia M  Minenkova O 《Gene》2007,391(1-2):120-129
We report the development of a novel phagemid vector, pKM19, for display of recombinant antibodies in single-chain format (scFv) on the surface of filamentous phage. This new vector improves efficacy of selection and reduces the biological bias against antibodies that can be harmful to host bacteria. It is useful for generation of large new antibody libraries, and for the subsequent maturation of antibody fragments. In comparison with commonly used plasmids, this vector is designed to have relatively low expression levels of cloned scFv antibodies due to the amber codon positioned in a sequence encoding for the PhoA leader peptide. Moreover, fusion of antibodies to the carboxy terminal part only of the gene III protein improves display of scFv on bacteriophage surface in this system. Despite the lower antibody expression, the functional test performed with a new scFv library derived from human peripheral blood lymphocytes demonstrates that specific antibodies can be easily isolated from the library, even after the second selection round. The use of the pKM19 vector for maturation of an anti-CEA antibody significantly improves the final results. In our previous work, an analogous selection through the use of a phagemid vector, with antibody expression under the control of a lacP promoter, led to isolation of anti-CEA phage antibodies with improved affinities, which were not producible in soluble form. Probably due to the toxicity for E. coli of that particular anti-CEA antibody, 70% of maturated clones contained suppressed stop codons, acquired during various selection/amplification rounds. The pKM19 plasmid facilitates an efficient maturation process, resulting in selection of antibodies with improved affinity without any stop codons.  相似文献   

19.
Nosek BA  Sriram N  Umansky E 《PloS one》2012,7(5):e36771
In two large web-based studies, across five distinct criteria, presenting survey items one-at-a-time was psychometrically either the same or better than presenting survey items all-at-once on a single web page to volunteer participants. In the one-at-a-time format, participants were no more likely to drop-out of the study (Criterion 1), and were much more likely to provide answers for the survey items (Criterion 2). Rehabilitating participants who otherwise would not have provided survey responses with the one-at-a-time format did not damage internal consistency of the measures (Criterion 3) nor did it negatively affect criterion validity (Criterion 4). Finally, the one-at-a-time format was more efficient with participants completing it more quickly than the all-at-once format (Criterion 5). In short, the one-at-a-time format results in less missing data with a shorter presentation time, and ultimately more power to detect relations among variables.  相似文献   

20.
Battye F 《Cytometry》2001,43(2):143-149
BACKGROUND: The obvious benefits of centralized data storage notwithstanding, the size of modern flow cytometry data files discourages their transmission over commonly used telephone modem connections. The proposed solution is to install at the central location a web servlet that can extract compact data arrays, of a form dependent on the requested display type, from the stored files and transmit them to a remote client computer program for display. METHODS: A client program and a web servlet, both written in the Java programming language, were designed to communicate over standard network connections. The client program creates familiar numerical and graphical display types and allows the creation of gates from combinations of user-defined regions. Data compression techniques further reduce transmission times for data arrays that are already much smaller than the data file itself. RESULTS: For typical data files, network transmission times were reduced more than 700-fold for extraction of one-dimensional (1-D) histograms, between 18 and 120-fold for 2-D histograms, and 6-fold for color-coded dot plots. Numerous display formats are possible without further access to the data file. CONCLUSIONS: This scheme enables telephone modem access to centrally stored data without restricting flexibility of display format or preventing comparisons with locally stored files.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号