首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
One of the major bottlenecks in the proteomics field today resides in the computational interpretation of the massive data generated by the latest generation of high‐throughput MS instruments. MS/MS datasets are constantly increasing in size and complexity and it becomes challenging to comprehensively process such huge datasets and afterwards deduce most relevant biological information. The Mass Spectrometry Data Analysis (MSDA, https://msda.unistra.fr ) online software suite provides a series of modules for in‐depth MS/MS data analysis. It includes a custom databases generation toolbox, modules for filtering and extracting high‐quality spectra, for running high‐performance database and de novo searches, and for extracting modified peptides spectra and functional annotations. Additionally, MSDA enables running the most computationally intensive steps, namely database and de novo searches, on a computer grid thus providing a net time gain of up to 99% for data processing.  相似文献   

3.
Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources (GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains (TADs). GITAR is composed of two main modules: (1) HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and (2) processed data library, a large collection of human and mouse datasets processed using HiCtool. HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.  相似文献   

4.
Creating visually pleasing graphs in data visualization programs such as Matlab is surprisingly challenging. One common problem is that the positions and sizes of non-data elements such as textual annotations must typically be specified in either data coordinates or in absolute paper coordinates, whereas it would be more natural to specify them using a combination of these coordinate systems. I propose a framework in which it is easy to express, e.g., “this label should appear 2 mm to the right of the data point at (3, 2)” or “this arrow should point to the datum at (2, 1) and be 5 mm long.” I describe an algorithm for the correct layout of graphs of arbitrary complexity with automatic axis scaling within this framework. An implementation is provided in the form of a complete 2D plotting package that can be used to produce publication-quality graphs from within Matlab or Octave.  相似文献   

5.
CellDepot containing over 270 datasets from 8 species and many tissues serves as an integrated web application to empower scientists in exploring single-cell RNA-seq (scRNA-seq) datasets and comparing the datasets among various studies through a user-friendly interface with advanced visualization and analytical capabilities. To begin with, it provides an efficient data management system that users can upload single cell datasets and query the database by multiple attributes such as species and cell types. In addition, the graphical multi-logic, multi-condition query builder and convenient filtering tool backed by MySQL database system, allows users to quickly find the datasets of interest and compare the expression of gene(s) across these. Moreover, by embedding the cellxgene VIP tool, CellDepot enables fast exploration of individual dataset in the manner of interactivity and scalability to gain more refined insights such as cell composition, gene expression profiles, and differentially expressed genes among cell types by leveraging more than 20 frequently applied plotting functions and high-level analysis methods in single cell research. In summary, the web portal available at http://celldepot.bxgenomics.com, prompts large scale single cell data sharing, facilitates meta-analysis and visualization, and encourages scientists to contribute to the single-cell community in a tractable and collaborative way. Finally, CellDepot is released as open-source software under MIT license to motivate crowd contribution, broad adoption, and local deployment for private datasets.  相似文献   

6.
ProSAT (for Protein Structure Annotation Tool) is a tool to facilitate interactive visualization of non-structure-based functional annotations in protein 3D structures. It performs automated mapping of the functional annotations onto the protein structure and allows functional sites to be readily identified upon visualization. The current version of ProSAT can be applied to large datasets of protein structures for fast visual identification of active and other functional sites derived from the SwissProt and Prosite databases.  相似文献   

7.
Microarrays are tools to study the expression profile of an entire genome. Technology, statistical tools and biological knowledge in general have evolved over the past ten years and it is now possible to improve analysis of previous datasets. We have developed a web interface called PHOENIX that automates the analysis of microarray data from preprocessing to the evaluation of significance through manual or automated parameterization. At each analytical step, several methods are possible for (re)analysis of data. PHOENIX evaluates a consensus score from several methods and thus determines the performance level of the best methods (even if the best performing method is not known). With an estimate of the true gene list, PHOENIX can evaluate the performance of methods or compare the results with other experiments. Each method used for differential expression analysis and performance evaluation has been implemented in the PEGASE back-end package, along with additional tools to further improve PHOENIX. Future developments will involve the addition of steps (CDF selection, geneset analysis, meta-analysis), methods (PLIER, ANOVA, Limma), benchmarks (spike-in and simulated datasets), and illustration of the results (automatically generated report).  相似文献   

8.
The study of polygenic disorders such as cardiovascular and metabolic diseases requires access to vast amounts of experimental and in silico data. Where animal models of disease are being used, visualization of syntenic genome regions is one of the most important tools supporting data analysis. We define what is required to visualize synteny in terms of the data being displayed, the screen layout, and user interaction. We then describe a prototype visualization tool, SyntenyVista, which provides integrated access to quantitative trait loci, microarray, and gene datasets. We believe that SyntenyVista is a significant step towards an improved representation of comparative genomics data.  相似文献   

9.
Extracting biomedical information from large metabolomic datasets by multivariate data analysis is of considerable complexity. Common challenges include among others screening for differentially produced metabolites, estimation of fold changes, and sample classification. Prior to these analysis steps, it is important to minimize contributions from unwanted biases and experimental variance. This is the goal of data preprocessing. In this work, different data normalization methods were compared systematically employing two different datasets generated by means of nuclear magnetic resonance (NMR) spectroscopy. To this end, two different types of normalization methods were used, one aiming to remove unwanted sample-to-sample variation while the other adjusts the variance of the different metabolites by variable scaling and variance stabilization methods. The impact of all methods tested on sample classification was evaluated on urinary NMR fingerprints obtained from healthy volunteers and patients suffering from autosomal polycystic kidney disease (ADPKD). Performance in terms of screening for differentially produced metabolites was investigated on a dataset following a Latin-square design, where varied amounts of 8 different metabolites were spiked into a human urine matrix while keeping the total spike-in amount constant. In addition, specific tests were conducted to systematically investigate the influence of the different preprocessing methods on the structure of the analyzed data. In conclusion, preprocessing methods originally developed for DNA microarray analysis, in particular, Quantile and Cubic-Spline Normalization, performed best in reducing bias, accurately detecting fold changes, and classifying samples.  相似文献   

10.
The correct display of data is often a key point for interpreting the results of experimental procedures. Multivariate data sets suffer from the problem of representation, since a dimensionality above 3 is beyond the capability of plotting programs. Moreover, non numerical variables such as protein annotations are usually fundamental for a full comprehension of biological data. Here we present a novel interactive XY plotter designed to take the full control of large datasets containing mixed-type variables, provided with an intuitive data management, a powerful labelling system and other features aimed at facilitating data interpretation and sub-setting. AVAILABILITY: XYLab program, test dataset and manual is available at www4.unifi.it/scibio/bioinfo/ XYLab.html.  相似文献   

11.
The software package DNAVis offers a fast, interactive and real-time visualization of DNA sequences and their comparative genome annotations. DNAVis implements advanced methods of information visualization such as linked views, perspective walls and semantic zooming, in addition to the display of heterologous data in dot plot-like matrix views.  相似文献   

12.
Microarray technology has become an integral part of biomedical research and increasing amounts of datasets become available through public repositories. However, re-use of these datasets is severely hindered by unstructured, missing or incorrect biological samples information; as well as the wide variety of preprocessing methods in use. The inSilicoDb R/Bioconductor package is a command-line front-end to the InSilico DB, a web-based database currently containing 86 104 expert-curated human Affymetrix expression profiles compiled from 1937 GEO repository series. The use of this package builds on the Bioconductor project's focus on reproducibility by enabling a clear workflow in which not only analysis, but also the retrieval of verified data is supported.  相似文献   

13.
Measures of nonlinearity and complexity, and in particular the study of Lyapunov exponents, have been increasingly used to characterize dynamical properties of a wide range of biological nonlinear systems, including cardiovascular control. In this work, we present a novel methodology able to effectively estimate the Lyapunov spectrum of a series of stochastic events in an instantaneous fashion. The paradigm relies on a novel point-process high-order nonlinear model of the event series dynamics. The long-term information is taken into account by expanding the linear, quadratic, and cubic Wiener-Volterra kernels with the orthonormal Laguerre basis functions. Applications to synthetic data such as the Hénon map and Rössler attractor, as well as two experimental heartbeat interval datasets (i.e., healthy subjects undergoing postural changes and patients with severe cardiac heart failure), focus on estimation and tracking of the Instantaneous Dominant Lyapunov Exponent (IDLE). The novel cardiovascular assessment demonstrates that our method is able to effectively and instantaneously track the nonlinear autonomic control dynamics, allowing for complexity variability estimations.  相似文献   

14.
The first step of many population genetic studies is the simple visualization of allele frequencies on a landscape. This basic data exploration can be challenging without proprietary software, and the manual plotting of data is cumbersome and unfeasible at large sample sizes. I present an open source, web-based program that plots any kind of frequency or count data as pie charts in Google Maps (Google Inc., Mountain View, CA). Pie polygons are then exportable to Google Earth (Google Inc.), a free Geographic Information Systems platform. Import of genetic data into Google Earth allows phylogeographers access to a wealth of spatial information layers integral to forming hypotheses and understanding patterns in the data.  相似文献   

15.
 Methods to present three-dimensional (3D) and time series of 3D datasets (4D) are demonstrated using the recent advances in confocal microscopy and computer visualization. The process of cell sorting during tip formation in the slime mould Dictyostelium discoideum is examined as an example by in vivo confocal microscopy of spectrally different green fluorescent protein (GFP) variants as reporters of cell-type specific gene expression. Also, cell sorting of the co-aggregating slime mould species D. discoideum and D. mucoroides is observed using a GFP variant and a spectrally distinguishable fluorescent vital stain. The confocal data are handled as 3D and 4D datasets, their processing and the advantages of different methods of visualization are discussed step by step. Selected sequences of the experiments can be viewed on the Internet, giving a much better impression of the complex cellular movements during Dictyostelium morphogenesis than printed photographs. Received: 17 February 1998 / Accepted: 14 June 1998  相似文献   

16.
The development of mobile-health technology has the potential to revolutionize personalized medicine. Biomedical sensors (e.g., wearables) can assist with determining treatment plans for individuals, provide quantitative information to healthcare providers, and give objective measurements of health, leading to the goal of precise phenotypic correlates for genotypes. Even though treatments and interventions are becoming more specific and datasets more abundant, measuring the causal impact of health interventions requires careful considerations of complex covariate structures, as well as knowledge of the temporal and spatial properties of the data. Thus, interpreting biomedical sensor data needs to make use of specialized statistical models. Here, we show how the Bayesian structural time series framework, widely used in economics, can be applied to these data. This framework corrects for covariates to provide accurate assessments of the significance of interventions. Furthermore, it allows for a time-dependent confidence interval of impact, which is useful for considering individualized assessments of intervention efficacy. We provide a customized biomedical adaptor tool, MhealthCI, around a specific implementation of the Bayesian structural time series framework that uniformly processes, prepares, and registers diverse biomedical data. We apply the software implementation of MhealthCI to a structured set of examples in biomedicine to showcase the ability of the framework to evaluate interventions with varying levels of data richness and covariate complexity and also compare the performance to other models. Specifically, we show how the framework is able to evaluate an exercise intervention’s effect on stabilizing blood glucose in a diabetes dataset. We also provide a future-anticipating illustration from a behavioral dataset showcasing how the framework integrates complex spatial covariates. Overall, we show the robustness of the Bayesian structural time series framework when applied to biomedical sensor data, highlighting its increasing value for current and future datasets.  相似文献   

17.

Extracting biomedical information from large metabolomic datasets by multivariate data analysis is of considerable complexity. Common challenges include among others screening for differentially produced metabolites, estimation of fold changes, and sample classification. Prior to these analysis steps, it is important to minimize contributions from unwanted biases and experimental variance. This is the goal of data preprocessing. In this work, different data normalization methods were compared systematically employing two different datasets generated by means of nuclear magnetic resonance (NMR) spectroscopy. To this end, two different types of normalization methods were used, one aiming to remove unwanted sample-to-sample variation while the other adjusts the variance of the different metabolites by variable scaling and variance stabilization methods. The impact of all methods tested on sample classification was evaluated on urinary NMR fingerprints obtained from healthy volunteers and patients suffering from autosomal polycystic kidney disease (ADPKD). Performance in terms of screening for differentially produced metabolites was investigated on a dataset following a Latin-square design, where varied amounts of 8 different metabolites were spiked into a human urine matrix while keeping the total spike-in amount constant. In addition, specific tests were conducted to systematically investigate the influence of the different preprocessing methods on the structure of the analyzed data. In conclusion, preprocessing methods originally developed for DNA microarray analysis, in particular, Quantile and Cubic-Spline Normalization, performed best in reducing bias, accurately detecting fold changes, and classifying samples.

  相似文献   

18.
We introduce here MATtrack, an open source MATLAB-based computational platform developed to process multi-Tiff files produced by a photo-conversion time lapse protocol for live cell fluorescent microscopy. MATtrack automatically performs a series of steps required for image processing, including extraction and import of numerical values from Multi-Tiff files, red/green image classification using gating parameters, noise filtering, background extraction, contrast stretching and temporal smoothing. MATtrack also integrates a series of algorithms for quantitative image analysis enabling the construction of mean and standard deviation images, clustering and classification of subcellular regions and injection point approximation. In addition, MATtrack features a simple user interface, which enables monitoring of Fluorescent Signal Intensity in multiple Regions of Interest, over time. The latter encapsulates a region growing method to automatically delineate the contours of Regions of Interest selected by the user, and performs background and regional Average Fluorescence Tracking, and automatic plotting. Finally, MATtrack computes convenient visualization and exploration tools including a migration map, which provides an overview of the protein intracellular trajectories and accumulation areas. In conclusion, MATtrack is an open source MATLAB-based software package tailored to facilitate the analysis and visualization of large data files derived from real-time live cell fluorescent microscopy using photoconvertible proteins. It is flexible, user friendly, compatible with Windows, Mac, and Linux, and a wide range of data acquisition software. MATtrack is freely available for download at eleceng.dit.ie/courtney/MATtrack.zip.  相似文献   

19.
20.
Background: Metagenomic sequencing is a complex sampling procedure from unknown mixtures of many genomes. Having metagenome data with known genome compositions is essential for both benchmarking bioinformatics software and for investigating influences of various factors on the data. Compared to data from real microbiome samples or from defined microbial mock community, simulated data with proper computational models are better for the purpose as they provide more flexibility for controlling multiple factors. Methods: We developed a non-uniform metagenomic sequencing simulation system (nuMetaSim) that is capable of mimicking various factors in real metagenomic sequencing to reflect multiple properties of real data with customizable parameter settings. Results: We generated 9 comprehensive metagenomic datasets with different composition complexity from of 203 bacterial genomes and 2 archaeal genomes related with human intestine system. Conclusion: The data can serve as benchmarks for comparing performance of different methods at different situations, and the software package allows users to generate simulation data that can better reflect the specific properties in their scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号