首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we study various lossless compression techniques for electroencephalograph (EEG) signals. We discuss a computationally simple pre-processing technique, where EEG signal is arranged in the form of a matrix (2-D) before compression. We discuss a two-stage coder to compress the EEG matrix, with a lossy coding layer (SPIHT) and residual coding layer (arithmetic coding). This coder is optimally tuned to utilize the source memory and the i.i.d. nature of the residual. We also investigate and compare EEG compression with other schemes such as JPEG2000 image compression standard, predictive coding based shorten, and simple entropy coding. The compression algorithms are tested with University of Bonn database and Physiobank Motor/Mental Imagery database. 2-D based compression schemes yielded higher lossless compression compared to the standard vector-based compression, predictive and entropy coding schemes. The use of pre-processing technique resulted in 6% improvement, and the two-stage coder yielded a further improvement of 3% in compression performance.  相似文献   

2.
Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.  相似文献   

3.
Battye F 《Cytometry》2001,43(2):143-149
BACKGROUND: The obvious benefits of centralized data storage notwithstanding, the size of modern flow cytometry data files discourages their transmission over commonly used telephone modem connections. The proposed solution is to install at the central location a web servlet that can extract compact data arrays, of a form dependent on the requested display type, from the stored files and transmit them to a remote client computer program for display. METHODS: A client program and a web servlet, both written in the Java programming language, were designed to communicate over standard network connections. The client program creates familiar numerical and graphical display types and allows the creation of gates from combinations of user-defined regions. Data compression techniques further reduce transmission times for data arrays that are already much smaller than the data file itself. RESULTS: For typical data files, network transmission times were reduced more than 700-fold for extraction of one-dimensional (1-D) histograms, between 18 and 120-fold for 2-D histograms, and 6-fold for color-coded dot plots. Numerous display formats are possible without further access to the data file. CONCLUSIONS: This scheme enables telephone modem access to centrally stored data without restricting flexibility of display format or preventing comparisons with locally stored files.  相似文献   

4.
Most existing Mass Spectra (MS) analysis programs are automatic and provide limited opportunity for editing during the interpretation. Furthermore, they rely entirely on publicly available databases for interpretation. VEMS (Virtual Expert Mass Spectrometrist) is a program for interactive analysis of peptide MS/MS spectra imported in text file format. Peaks are annotated, the monoisotopic peaks retained, and the b-and y-ion series identified in an interactive manner. The called peptide sequence is searched against a local protein database for sequence identity and peptide mass. The report compares the calculated and the experimental mass spectrum of the called peptide. The program package includes four accessory programs. VEMStrans creates protein databases in FASTA format from EST or cDNA sequence files. VEMSdata creates a virtual peptide database from FASTA files. VEMSdist displays the distribution of masses up to 5000 Da. VEMSmaldi searches singly charged peptide masses against the local database.  相似文献   

5.
6.
We have created databases and software applications for the analysis of DNA mutations at the humanp53gene, the humanhprtgene and both the rodent transgeniclacIandlacZlocus. The databases themselves are stand-alone dBASE files and the software for analysis of the databases runs on IBM-compatible computers. Each database has a separate software analysis program. The software created for these databases permit the filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web (WWW). Open the following home page with a Web Browser: http://sunsite.unc.edu/dnam/mainpage.ht ml . Alternatively, the databases and programs are available via public FTP from: anonymous@sunsite.unc.edu . There is no password required to enter the system. The databases and software are found beneath the subdirectory: pub/academic/biology/dna-mutations. Two other programs are available at the site-a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.  相似文献   

7.
SPIRE is a Python program written to modernize the user interaction with SPIDER, the image processing system for electron microscopical reconstruction projects. SPIRE provides a graphical user interface (GUI) to SPIDER for executing batch files of SPIDER commands. It also lets users quickly view the status of a project by showing the last batch files that were run, as well as the data files that were generated. SPIRE handles the flexibility of the SPIDER programming environment through configuration files: XML-tagged documents that describe the batch files, directory trees, and presentation of the GUI for a given type of reconstruction project. It also provides the capability to connect to a laboratory database, for downloading parameters required by batch files at the start of a project, and uploading reconstruction results at the end of a project.  相似文献   

8.
We have created databases and software applications for the analysis of DNA mutations at the human p53 gene, the human hprt gene and both the rodent transgenic lacI and lacZ loci. The databases themselves are stand-alone dBASE files and the software for analysis of the databases runs on IBM-compatible computers with Microsoft Windows. Each database has a separate software analysis program. The software created for these databases permit the filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web. Open the following home page with a Web Browser: http://sunsite.unc.edu/dnam/mainpage. html . Alternatively, the databases and programs are available via public FTP from: anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found beneath the subdirectory: pub/academic/biology/dna-mutations. Two other programs are available at the site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.  相似文献   

9.
In this paper, a new Wavelet threshold based ECG signal compression technique using uniform scalar zero zone quantizer (USZZQ) and Huffman coding on differencing significance map (DSM) is proposed. Wavelet coefficients are selected based on the energy packing efficiency of each sub-band. Significant Wavelet coefficients are quantized with uniform scalar zero zone quantizer. Significance map is created to store the indices of the significant coefficients. This map is encoded efficiently with less number of bits by applying Huffman coding on the differences between indices in the significance map. ECG records from the MIT-BIH arrhythmia database are selected as test data. For the record 117, the proposed technique achieves a compression ratio of 18.7:1 with lower percentage root mean square difference (PRD) compared to other threshold based methods. The proposed technique is tested for MIT-BIH arrhythmia record 119 and a compression ratio of 21.81:1 is achieved with a PRD value of 3.716% which is much lower compared to the reported PRD value of 5.0 and 5.5% of set partitioning in hierarchical tress (SPIHT) and analysis by synthesis ECG compressor (ASEC), respectively. The noise eliminating capability of the proposed technique is also demonstrated in this work. The proposed technique achieves the required compression ratio with less reconstruction error for GSM-based cellular telemedicine system.  相似文献   

10.
MOTIVATION: Expressed Sequence Tags (ESTs) are next to cDNA sequences as the most direct way to locate in silico the genes of the genome and determine their structure. Currently ESTs make up more than 60% of all the database entries. The goal of this work is the development of a new program called DNA Intelligent Analysis for ESTs (DIANA-EST) based on a combination of Artificial Neural Networks (ANN) and statistics for the characterization of the coding regions within ESTs and the reconstruction of the encoded protein. RESULTS: 89.7% of the nucleotides from an independent test set with 127 ESTs were predicted correctly as to whether they are coding or non coding. AVAILABILITY: The program is available upon request from the author. CONTACT: Present address: Department of Genetics, University of Pennsylvania, School of Medicine, 475 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104-6145, USA. artemis@pcbi.upenn.edu.  相似文献   

11.
BLMT     
Statistical analysis of amino acid and nucleotide sequences, especially sequence alignment, is one of the most commonly performed tasks in modern molecular biology. However, for many tasks in bioinformatics, the requirement for the features in an alignment to be consecutive is restrictive and "n-grams" (aka k-tuples) have been used as features instead. N-grams are usually short nucleotide or amino acid sequences of length n, but the unit for a gram may be chosen arbitrarily. The n-gram concept is borrowed from language technologies where n-grams of words form the fundamental units in statistical language models. Despite the demonstrated utility of n-gram statistics for the biology domain, there is currently no publicly accessible generic tool for the efficient calculation of such statistics. Most sequence analysis tools will disregard matches because of the lack of statistical significance in finding short sequences. This article presents the integrated Biological Language Modeling Toolkit (BLMT) that allows efficient calculation of n-gram statistics for arbitrary sequence datasets. AVAILABILITY: BLMT can be downloaded from http://www.cs.cmu.edu/~blmt/source and installed for standalone use on any Unix platform or Unix shell emulation such as Cygwin on the Windows platform. Specific tools and usage details are described in a "readme" file. The n-gram computations carried out by the BLMT are part of a broader set of tools borrowed from language technologies and modified for statistical analysis of biological sequences; these are available at http://flan.blm.cs.cmu.edu/.  相似文献   

12.
遗传学实验显微图像采集与演示系统设计   总被引:6,自引:0,他引:6  
利用 Microsoft® Visual Basic 6.0 设计开发了“遗传学实验显微图像采集与演示系统”,该系统主要包括图像采集和编辑、文本输入和编辑,实验指导和图像演示、图像检索和数据库管理、系统维护及帮助等功能,各部分都以窗口形式设计,既可采用视频图像采集卡实时捕获图像,又可通过扫描仪、数码相机、剪贴板或文件输入已有的图像,并经压缩技术处理后与文字说明、实验指导一同存入数据库,方便、快捷、灵活地实现了对图像的输入和编辑、查询和演示,无论是对教师的遗传学实验教学,还是对学生的遗传学实验的自学,都起到了良好的辅助作用。Abstract: A system for capturing and showing micrographs of genetics was designed with Microsoft® Visual Basic 6.0. The system includes many functions such as capturing and editing images, typing and editing text, teaching experiments, showing images, image retrieval, database management, system maintenance and help, all of them were developed with the form of windows. The system could collect images not only from image-grabber card in real-time but also from scanner, digital camera, clipboard and files. After utilizing the image compression technology, the images will be saved in database along with experiment instruction. With all the features referred above, the system can used as a wonderful assistant both for the teaching of genetics experiment and for the students’ learning by themselves.  相似文献   

13.
Experimental constraints associated with NMR structures are available from the Protein Data Bank (PDB) in the form of `Magnetic Resonance' (MR) files. These files contain multiple types of data concatenated without boundary markers and are difficult to use for further research. Reported here are the results of a project initiated to annotate, archive, and disseminate these data to the research community from a searchable resource in a uniform format. The MR files from a set of 1410 NMR structures were analyzed and their original constituent data blocks annotated as to data type using a semi-automated protocol. A new software program called Wattos was then used to parse and archive the data in a relational database. From the total number of MR file blocks annotated as constraints, it proved possible to parse 84% (3337/3975). The constraint lists that were parsed correspond to three data types (2511 distance, 788 dihedral angle, and 38 residual dipolar couplings lists) from the three most popular software packages used in NMR structure determination: XPLOR/CNS (2520 lists), DISCOVER (412 lists), and DYANA/DIANA (405 lists). These constraints were then mapped to a developmental version of the BioMagResBank (BMRB) data model. A total of 31 data types originating from 16 programs have been classified, with the NOE distance constraint being the most commonly observed. The results serve as a model for the development of standards for NMR constraint deposition in computer-readable form. The constraints are updated regularly and are available from the BMRB web site (http://www.bmrb.wisc.edu).  相似文献   

14.
Clinical GeneOrganizer (CGO) is a novel windows-based archiving, organization and data mining software for the integration of gene expression profiling in clinical medicine. The program implements various user-friendly tools and extracts data for further statistical analysis. This software was written for Affymetrix GeneChip *.txt files, but can also be used for any other microarray-derived data. The MS-SQL server version acts as a data mart and links microarray data with clinical parameters of any other existing database and therefore represents a valuable tool for combining gene expression analysis and clinical disease characteristics.  相似文献   

15.

Background  

Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil.  相似文献   

16.
SUMMARY: Large volumes of microarray data are generated and deposited in public databases. Most of this data is in the form of tab-delimited text files or Excel spreadsheets. Combining data from several of these files to reanalyze these data sets is time consuming. Microarray Data Assembler is specifically designed to simplify this task. The program can list files and data sources, convert selected text files into Excel files and assemble data across multiple Excel worksheets and workbooks. This program thus makes data assembling easy, saves time and helps avoid manual error. AVAILABILITY: The program is freely available for non-profit use, via email request from the author, after signing a Material Transfer Agreement with Johns Hopkins University.  相似文献   

17.
The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.  相似文献   

18.
We have created databases and software applications for the analysis of DNA mutations in the human p53 gene, the human hprt gene and the rodent transgenic lacZ locus. The databases themselves are stand-alone dBase files and the software for analysis of the databases runs on IBM- compatible computers. The software created for these databases permits filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web (WWW). Open home page http://sunsite.unc.edu/dnam/mainpage.ht ml with a WWW browser. Alternatively, the databases and programs are available via public ftp from anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found in subdirectory pub/academic/biology/dna-mutations. Two other programs are available at the WWW site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.  相似文献   

19.
《Journal of Proteomics》2010,73(2):357-360
We developed a software program (titled Precursor Ion Calibration software for LTQ or, in short, PICsL) that increases the reliability of precursor ion assignations from LC-MS analysis using ultra zoom scanning of LTQ linear ion trap MS and automatically corrects the assignations. Although existing software calculates the theoretical isotopic distribution according to m/z with a computational algorithm, our method simply searches for ions close to the theoretical mass value using both MS/MS raw data and Mascot search result files, followed by a second database search that identifies the proteins using the regenerated peak list files. Our software program mimics the manual inspection of the spectral data of precursor ions and is expected to be applicable not only for low resolution MS, such as LTQ, but also for a wide variety of MS instruments.  相似文献   

20.
Gene-finding program evaluation (GFPE) is a set of Java classes for evaluating gene-finding programs. A command-line interface is also provided. Inputs to the program include the sequence data (in FASTA format), annotations of "actual" sequence features, and annotations of "predicted" sequence features. Annotation files are in the General Feature Format promoted by the Sanger center. GFPE calculates a number of metrics of accuracy of predictions at three levels:the coding level, the exon level, and the protein level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号