首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages – Phred and Staden are used by preAssemble to perform sequence quality processing.  相似文献   

2.
The mzQuantML standard from the HUPO Proteomics Standards Initiative has recently been released, capturing quantitative data about peptides and proteins, following analysis of MS data. We present a Java application programming interface (API) for mzQuantML called jmzQuantML. The API provides robust bridges between Java classes and elements in mzQuantML files and allows random access to any part of the file. The API provides read and write capabilities, and is designed to be embedded in other software packages, enabling mzQuantML support to be added to proteomics software tools ( http://code.google.com/p/jmzquantml/ ). The mzQuantML standard is designed around a multilevel validation system to ensure that files are structurally and semantically correct for different proteomics quantitative techniques. In this article, we also describe a Java software tool ( http://code.google.com/p/mzquantml‐validator/ ) for validating mzQuantML files, which is a formal part of the data standard.  相似文献   

3.
The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored therein. Because users are not required to reformat the data file for each program, use of the NEXUS format prevents cut-and-paste errors as well as the proliferation of copies of the original data file. The purpose of making the NCL available is to encourage the use of the NEXUS format by making it relatively easy for programmers to add the ability to interpret NEXUS files in newly developed software. AVAILABILITY: The NCL is freely available under the GNU General Public License from http://hydrodictyon.eeb.uconn.edu/ncl/ Supplementary information: Documentation for the NCL (general information and source code documentation) is available in HTML format at http://hydrodictyon.eeb.uconn.edu/ncl/  相似文献   

4.
This study presents the correlation between energy deposition and clustered DNA damage, based on a Monte Carlo simulation of the spectrum of direct DNA damage induced by low-energy electrons including the dissociative electron attachment. Clustered DNA damage is classified as simple and complex in terms of the combination of single-strand breaks (SSBs) or double-strand breaks (DSBs) and adjacent base damage (BD). The results show that the energy depositions associated with about 90% of total clustered DNA damage are below 150 eV. The simple clustered DNA damage, which is constituted of the combination of SSBs and adjacent BD, is dominant, accounting for 90% of all clustered DNA damage, and the spectra of the energy depositions correlating with them are similar for different primary energies. One type of simple clustered DNA damage is the combination of a SSB and 1–5 BD, which is denoted as SSB?+?BD. The average contribution of SSB?+?BD to total simple clustered DNA damage reaches up to about 84% for the considered primary energies. In all forms of SSB?+?BD, the SSB?+?BD including only one base damage is dominant (above 80%). In addition, for the considered primary energies, there is no obvious difference between the average energy depositions for a fixed complexity of SSB?+?BD determined by the number of base damage, but average energy depositions increase with the complexity of SSB?+?BD. In the complex clustered DNA damage constituted by the combination of DSBs and BD around them, a relatively simple type is a DSB combining adjacent BD, marked as DSB?+?BD, and it is of substantial contribution (on average up to about 82%). The spectrum of DSB?+?BD is given mainly by the DSB in combination with different numbers of base damage, from 1 to 5. For the considered primary energies, the DSB combined with only one base damage contributes about 83% of total DSB?+?BD, and the average energy deposition is about 106 eV. However, the energy deposition increases with the complexity of clustered DNA damage, and therefore, the clustered DNA damage with high complexity still needs to be considered in the study of radiation biological effects, in spite of their small contributions to all clustered DNA damage.  相似文献   

5.
6.
《动物分类学报》2017,(1):34-45
Geometric morphometrics (GM) is an important method of shape analysis and increasingly used in a wide range of scientific disciplines.Presently,a single character comparison system of geometric morphometric data is used in almost all empirical studies,and this approach is sufficient for many scientific problems.However,the estimation of overall similarity among taxa or objects based on multiple characters is crucial in a variety of contexts (e.g.(semi-)automated identification,phenetic relationships,tracing of character evolution,phylogenetic reconstruction).Here we propose a new web-based tool for merging several geometric morphometrics data files from multiple characters into a single data file.Using this approach information from multiple characters can be compared in combination and an overall similarity estimate can be obtained in a convenient and geometrically rigorous manner.To illustrate our method,we provide an example analysis of 25 dung beetle species with seven Procrustes superimposed landmark data files representing the morphological variation of body features:the epipharynx,right mandible,pronotum,elytra,hindwing,and the metendosternite in dorsal and lateral view.All seven files were merged into a single one containing information on 649 landmark locations.The possible applications of such merged data files in different fields of science are discussed.  相似文献   

7.
SUMMARY: Large volumes of microarray data are generated and deposited in public databases. Most of this data is in the form of tab-delimited text files or Excel spreadsheets. Combining data from several of these files to reanalyze these data sets is time consuming. Microarray Data Assembler is specifically designed to simplify this task. The program can list files and data sources, convert selected text files into Excel files and assemble data across multiple Excel worksheets and workbooks. This program thus makes data assembling easy, saves time and helps avoid manual error. AVAILABILITY: The program is freely available for non-profit use, via email request from the author, after signing a Material Transfer Agreement with Johns Hopkins University.  相似文献   

8.
We describe PerlMAT, a Perl microarray toolkit providing easy to use object-oriented methods for the simplified manipulation, management and analysis of microarray data. The toolkit provides objects for the encapsulation of microarray spots and reporters, several common microarray data file formats and GAL files. In addition, an analysis object provides methods for data processing, and an image object enables the visualisation of microarray data. This important addition to the Perl developer's library will facilitate more widespread use of Perl for microarray application development within the bioinformatics community. The coherent interface and well-documented code enables rapid analysis by even inexperienced Perl developers. AVAILABILITY: Software is available at http://sourceforge.net/projects/perlmat  相似文献   

9.
A model for the secondary structure of mouse beta Maj globin messenger RNA is presented based on enzymatic digestion data, comparative sequence and computer analysis. Using 5'-32P-end-labeled beta globin mRNA as a substrate, single-stranded regions were determined with S1 and T1 nucleases and double-stranded regions with V1 ribonuclease from cobra venom. The structure data obtained for ca. 75% of the molecule was introduced into a computer algorithm which predicts secondary structures of minimum free energy consistent with the enzymatic data. Two prominent base paired regions independently derived by phylogenetic analysis were also present in the computer generated structure lending support for the model. An interesting feature of the model is the presence of long-range base pairing interactions which permit the beta globin mRNA to fold back on itself, thereby bringing the 5'- and 3'-noncoding regions within close proximity. This feature is consistent with data from other laboratories suggesting an interaction of the 5'- and 3'-domains in the mammalian globin mRNAs.  相似文献   

10.
The Internet is enabling greater access to spectral imaging publications, spectral graphs, and data than that was available a generation ago. The spectral imaging systems discussed in this issue of Cytometry work because reagent and hardware spectra are reproducible, reusable, and provide input to spectral unmixing and spectral components recognition algorithms. These spectra need to be readily available in order to determine what to purchase, how to use it, and what the output means. We refer to several commercially sponsored and academic spectral web sites and discuss our spectral graphing and data sites. Sites include fluorescent dye graph servers from Invitrogen/Molecular Probes, BD Biosciences, Zeiss/Bio-Rad Cell Sciences, and filter set servers from Chroma Technology and Omega Optical. Several of these sites include data download capabilities. Recently, two microscope manufacturers have published on their web sites transmission curves for select objective lenses-crucial data for anyone doing multiphoton excitation microscopy. Notable among the academic sites, PhotoChemCAD 2.0 has over 200 dyes and a downloadable database/graphing program, and the USC-A Chemistry UV-vis Database displays absorption spectra of many dyes and indicators used in clinical histology and pathology. Our Fluorescent Spectra graphing/calculator site presents dyes, filters, and illumination data from many of these and additional sources. PubSpectra is our free download site which uses Microsoft Excel files as standardized human/machine readable format with over 2,000 biomedical spectra. The principle that data is not subject to copyright provides a framework in which all scientific data should be made freely accessible.  相似文献   

11.

Background

Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling.

Results

For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer’s bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent.

Conclusion

Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
  相似文献   

12.
A proposal for a flow cytometric data file standard   总被引:1,自引:0,他引:1  
R F Murphy  T M Chused 《Cytometry》1984,5(5):553-555
The increasing complexity of multiparameter data collection and analysis in flow cytometry and the development of relatively inexpensive arc-lamp-based flow cytometers, which increases the probability that laboratories or institutions may have more than one type of instrument, creates a need for shareable analysis programs and for the transport of flow cytometric data files within an installation or from one institution to another. To address this need, we propose a standard file format to be used for all flow cytometric data. The general principles of this proposal are: (1) The data file will contain a minimum of three segments, TEXT, DATA, and ANALYSIS; (2) The TEXT and ANALYSIS segments consist of KEYWORDS, which are the names of data fields, and their values; (3) All TEXT is encoded in ASCII; (4) KEYWORDS and their values may be of any length; (5) Certain KEYWORDS will be standard, i.e., having specified formats to be recognized by all programs. The structure of the DATA segment will be uniquely defined by the values of KEYWORDS in the TEXT area. It may be in any bit resolution, facilitating compatibility between machines with different word length and/or allowing bit compression of the data. The structured nature of the TEXT area should facilitate management of flow cytometric data using existing data base management systems. The proposed file format has been implemented on VAX, PDP-11, and HP9920 based flow cytometry data acquisition systems.  相似文献   

13.
MOTIVATION: The large and growing body of experimental data on biomolecular binding is of enormous value in developing a deeper understanding of molecular biology, in developing new therapeutics, and in various molecular design applications. However, most of these data are found only in the published literature and are therefore difficult to access and use. No existing public database has focused on measured binding affinities and has provided query capabilities that include chemical structure and sequence homology searches. METHODS & RESULTS: We have created Binding DataBase (BindingDB), a public, web-accessible database of measured binding affinities. BindingDB is based upon a relational data specification for describing binding measurements via Isothermal Titration Calorimetry (ITC) and enzyme inhibition. A corresponding XML Document Type Definition (DTD) is used to create and parse intermediate files during the on-line deposition process and will also be used for data interchange, including collection of data from other sources. The on-line query interface, which is constructed with Java Servlet technology, supports standard SQL queries as well as searches for molecules by chemical structure and sequence homology. The on-line deposition interface uses Java Server Pages and JavaBean objects to generate dynamic HTML and to store intermediate results. The resulting data resource provides a range of functionality with brisk response-times, and lends itself well to continued development and enhancement.  相似文献   

14.
Battye F 《Cytometry》2001,43(2):143-149
BACKGROUND: The obvious benefits of centralized data storage notwithstanding, the size of modern flow cytometry data files discourages their transmission over commonly used telephone modem connections. The proposed solution is to install at the central location a web servlet that can extract compact data arrays, of a form dependent on the requested display type, from the stored files and transmit them to a remote client computer program for display. METHODS: A client program and a web servlet, both written in the Java programming language, were designed to communicate over standard network connections. The client program creates familiar numerical and graphical display types and allows the creation of gates from combinations of user-defined regions. Data compression techniques further reduce transmission times for data arrays that are already much smaller than the data file itself. RESULTS: For typical data files, network transmission times were reduced more than 700-fold for extraction of one-dimensional (1-D) histograms, between 18 and 120-fold for 2-D histograms, and 6-fold for color-coded dot plots. Numerous display formats are possible without further access to the data file. CONCLUSIONS: This scheme enables telephone modem access to centrally stored data without restricting flexibility of display format or preventing comparisons with locally stored files.  相似文献   

15.
Failing to open computer files that describe image data is not the most frustrating experience that the user of a computer can suffer, but it is high on list of possible aggravations. To ameliorate this, the structure of uncompressed image data files is described here. The various ways in which information that describes a picture can be recorded are related, and a primary distinction between raster or bitmap based and vector or object based image data files is drawn. Bitmap based image data files are the more useful of the two formats for recording complicated images such as digital light micrographs, whereas object based files are better for recording illustrations and cartoons. Computer software for opening a very large variety of different formats of digital image data is recommended, and if these fail, ways are described for opening bitmap based digital image data files whose format is unknown.  相似文献   

16.
Failing to open computer files that describe image data is not the most frustrating experience that the user of a computer can suffer, but it is high on list of possible aggravations. To ameliorate this, the structure of uncompressed image data files is described here. The various ways in which information that describes a picture can be recorded are related, and a primary distinction between raster or bitmap based and vector or object based image data files is drawn. Bitmap based image data files are the more useful of the two formats for recording complicated images such as digital light micrographs, whereas object based files are better for recording illustrations and cartoons. Computer software for opening a very large variety of different formats of digital image data is recommended, and if these fail, ways are described for opening bitmap based digital image data files whose format is unknown.  相似文献   

17.
Mass spectrometry-based proteomics is increasingly being used in biomedical research. These experiments typically generate a large volume of highly complex data, and the volume and complexity are only increasing with time. There exist many software pipelines for analyzing these data (each typically with its own file formats), and as technology improves, these file formats change and new formats are developed. Files produced from these myriad software programs may accumulate on hard disks or tape drives over time, with older files being rendered progressively more obsolete and unusable with each successive technical advancement and data format change. Although initiatives exist to standardize the file formats used in proteomics, they do not address the core failings of a file-based data management system: (1) files are typically poorly annotated experimentally, (2) files are "organically" distributed across laboratory file systems in an ad hoc manner, (3) files formats become obsolete, and (4) searching the data and comparing and contrasting results across separate experiments is very inefficient (if possible at all). Here we present a relational database architecture and accompanying web application dubbed Mass Spectrometry Data Platform that is designed to address the failings of the file-based mass spectrometry data management approach. The database is designed such that the output of disparate software pipelines may be imported into a core set of unified tables, with these core tables being extended to support data generated by specific pipelines. Because the data are unified, they may be queried, viewed, and compared across multiple experiments using a common web interface. Mass Spectrometry Data Platform is open source and freely available at http://code.google.com/p/msdapl/.  相似文献   

18.
S Demers  J Kim  P Legendre  L Legendre 《Cytometry》1992,13(3):291-298
Flow cytometry has recently been introduced in aquatic ecology. Its unique feature is to measure several optical characteristics simultaneously on a large number of cells. Until now, these data have generally been analyzed in simple ways, e.g., frequency histograms and bivariate scatter diagrams, so that the multivariate potential of the data has not been fully exploited. This paper presents a way of answering ecologically meaningful questions, using the multivariate characteristics of the data. In order to do so, the multivariate data are reduced to a small number of classes by clustering, which reduces the data to a categorical variable. Multivariate pairwise comparisons can then be performed among samples using these new data vectors. The test case presented in the paper forms a time series of observations from which the new method enables us to study on the temporal evolution of cell types.  相似文献   

19.
It ought to be easy to exchange digital micrographs and other computer data files with a colleague even on another continent. In practice, this often is not the case. The advantages and disadvantages of various methods that are available for exchanging data files between computers are discussed. When possible, data should be transferred through computer networking. When data are to be exchanged locally between computers with similar operating systems, the use of a local area network is recommended. For computers in commercial or academic environments that have dissimilar operating systems or are more widely spaced, the use of FTPs is recommended. Failing this, posting the data on a website and transferring by hypertext transfer protocol is suggested. If peer to peer exchange between computers in domestic environments is needed, the use of Messenger services such as Microsoft Messenger or Yahoo Messenger is the method of choice. When it is not possible to transfer the data files over the internet, single use, writable CD ROMs are the best media for transferring data. If for some reason this is not possible, DVD-R/RW, DVD+R/RW, 100 MB ZIP disks and USB flash media are potentially useful media for exchanging data files.  相似文献   

20.
We present a Java application programming interface (API), jmzIdentML, for the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) mzIdentML standard for peptide and protein identification data. The API combines the power of Java Architecture of XML Binding (JAXB) and an XPath-based random-access indexer to allow a fast and efficient mapping of extensible markup language (XML) elements to Java objects. The internal references in the mzIdentML files are resolved in an on-demand manner, where the whole file is accessed as a random-access swap file, and only the relevant piece of XMLis selected for mapping to its corresponding Java object. The APIis highly efficient in its memory usage and can handle files of arbitrary sizes. The APIfollows the official release of the mzIdentML (version 1.1) specifications and is available in the public domain under a permissive licence at http://www.code.google.com/p/jmzidentml/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号