首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Introduction: The availability of big data sets (‘OMICS’) has greatly impacted fundamental and translational science. High-throughput analysis of HLA class I and II associated peptidomes by mass spectrometry (MS) has generated large datasets, with the last decade witnessing tremendous growth in the breadth and number of studies.

Areas covered: For this, we first analyzed naturally processed peptide (NP) data captured within the IEDB to survey and characterize the current state of NP data. We next asked to what extent the NP data overlap with existing T cell epitope and MHC binding data.

Expert commentary: The current collection of NP data represents a large and diverse set of class I/II peptides mostly derived from self-antigens. These data overlap only marginally with existing immunogenicity and binding data and it is thus difficult to ascertain the correspondence between the different assay methodologies. This highlights a need for unbiased studies benchmarking in model antigen systems how well MHC binding and NP data predicts immunogenicity. Going forward, efforts at generating an integrated process for capturing all NP, curating associated metadata and accessing NP data from an immunological viewpoint will be important for development of novel methods for identifying optimal target antigens and for class I and II epitope prediction.  相似文献   


2.
Introduction: Multi-omic approaches are promising a broader view on cellular processes and a deeper understanding of biological systems. with strongly improved high-throughput methods the amounts of data generated have become huge, and their handling challenging.

Area Covered: New bioinformatic tools and pipelines for the integration of data from different omics disciplines continue to emerge, and will support scientists to reliably interpret data in the context of biological processes. comprehensive data integration strategies will fundamentally improve systems biology and systems medicine. to present recent developments of integrative omics, the göttingen proteomics forum (gpf) organized its 6th symposium on the 23rd of november 2017, as part of a series of regular gpf symposia. more than 140 scientists attended the event that highlighted the challenges and opportunities but also the caveats of integrating data from different omics disciplines.

Expert commentary: The continuous exponential growth in omics data require similar development in software solutions for handling this challenge. Integrative omics tools offer the chance to handle this challenge but profound investigations and coordinated efforts are required to boost this field.  相似文献   


3.
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging.

Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing.

Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data.  相似文献   


4.
5.
Information Quality (IQ) is a critical factor for the success of many activities in the information age, including the development of data warehouses and implementation of data mining. The issue of IQ risk is recognized during the process of data mining; however, there is no formal methodological approach to dealing with such issues.

Consequently, it is essential to measure the risk of IQ in a data warehouse to ensure success in implementing data mining. This article presents a methodology to determine three IQ risk characteristics: accuracy, comprehensiveness, and non-membership. The methodology provides a set of quantitative models to examine how the quality risks of source information affect the quality for information outputs produced using the relational algebra operations: Restriction, Projection, and Cubic product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The study also develops a data cube model and associated algebra to support IQ risk operations.  相似文献   


6.
The submission of multiple sequence alignment data to EMBL has grown 30-fold in the past 10 years, creating a problem of archiving them. The EBI has developed a new public database of multiple sequence alignments called EMBL-Align. It has a dedicated web-based submission tool, Webin-Align. Together they represent a comprehensive data management solution for alignment data. Webin-Align accepts all the common alignment formats and can display data in CLUSTALW format as well as a new standard EMBL-Align flat file format. The alignments are stored in the EMBL-Align database and can be queried from the EBI SRS (Sequence Retrieval System) server. AVAILABILITY: Webin-Align: http://www.ebi.ac.uk/embl/Submission/align_top.html, EMBL-Align: ftp://ftp.ebi.ac.uk/pub/databases/embl/align, http://srs.ebi.ac.uk/  相似文献   

7.
Structural systems identification of genetic regulatory networks   总被引:2,自引:0,他引:2  
MOTIVATION: Reverse engineering of genetic regulatory networks from experimental data is the first step toward the modeling of genetic networks. Linear state-space models, also known as linear dynamical models, have been applied to model genetic networks from gene expression time series data, but existing works have not taken into account available structural information. Without structural constraints, estimated models may contradict biological knowledge and estimation methods may over-fit. RESULTS: In this report, we extended expectation-maximization (EM) algorithms to incorporate prior network structure and to estimate genetic regulatory networks that can track and predict gene expression profiles. We applied our method to synthetic data and to SOS data and showed that our method significantly outperforms the regular EM without structural constraints. AVAILABILITY: The Matlab code is available upon request and the SOS data can be downloaded from http://www.weizmann.ac.il/mcb/UriAlon/Papers/SOSData/, courtesy of Uri Alon. Zak's data is available from his website, http://www.che.udel.edu/systems/people/zak.  相似文献   

8.
TableView is a generalized scientific visualization program for exploration of various biological data, including EST, SAGE, microarray and annotation data. Written in Java, TableView is portable, is easily used together with other software including DBMSs and is versatile enough to be applied to any tabular data AVAILABILITY: TableView is freely available at: http://ccgb.umn.edu/software/java/apps/TableView/.  相似文献   

9.
Introduction: The study of microbial communities based on the combined analysis of genomic and proteomic data – called metaproteogenomics – has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment.

Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications.

Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.  相似文献   


10.
SUMMARY: OTUbase is an R package designed to facilitate the analysis of operational taxonomic unit (OTU) data and sequence classification (taxonomic) data. Currently there are programs that will cluster sequence data into OTUs and/or classify sequence data into known taxonomies. However, there is a need for software that can take the summarized output of these programs and organize it into easily accessed and manipulated formats. OTUbase provides this structure and organization within R, to allow researchers to easily manipulate the data with the rich library of R packages currently available for additional analysis. AVAILABILITY: OTUbase is an R package available through Bioconductor. It can be found at http://www.bioconductor.org/packages/release/bioc/html/OTUbase.html.  相似文献   

11.
ToxoDB: accessing the Toxoplasma gondii genome   总被引:1,自引:0,他引:1  
ToxoDB (http://ToxoDB.org) provides a genome resource for the protozoan parasite Toxoplasma gondii. Several sequencing projects devoted to T. gondii have been completed or are in progress: an EST project (http://genome.wustl.edu/est/index.php?toxoplasma=1), a BAC clone end-sequencing project (http://www.sanger.ac.uk/Projects/T_gondii/) and an 8X random shotgun genomic sequencing project (http://www.tigr.org/tdb/e2k1/tga1/). ToxoDB was designed to provide a central point of access for all available T. gondii data, and a variety of data mining tools useful for the analysis of unfinished, un-annotated draft sequence during the early phases of the genome project. In later stages, as more and different types of data become available (microarray, proteomic, SNP, QTL, etc.) the database will provide an integrated data analysis platform facilitating user-defined queries across the different data types.  相似文献   

12.
Mediante is a MIAME-compliant microarray data manager that links together annotations and experimental data. Developed as a J2EE three-tier application, Mediante integrates a management system for production of long oligonucleotide microarrays, an experimental data repository suitable for home made or commercial microarrays, and a user interface dedicated to the management of microarrays projects. Several tools allow quality control of hybridizations and submission of validated data to public repositories. AVAILABILITY: http://www.microarray.fr. SUPPLEMENTARY INFORMATION: http://www.microarray.fr/SP/lebrigand2007/  相似文献   

13.
ProServer: a simple, extensible Perl DAS server   总被引:1,自引:0,他引:1  
SUMMARY: The increasing size and complexity of biological databases has led to a growing trend to federate rather than duplicate them. In order to share data between federated databases, protocols for the exchange mechanism must be developed. One such data exchange protocol that is widely used is the Distributed Annotation System (DAS). For example, DAS has enabled small experimental groups to integrate their data into the Ensembl genome browser. We have developed ProServer, a simple, lightweight, Perl-based DAS server that does not depend on a separate HTTP server. The ProServer package is easily extensible, allowing data to be served from almost any underlying data model. Recent additions to the DAS protocol have enabled both structure and alignment (sequence and structural) data to be exchanged. ProServer allows both of these data types to be served. AVAILABILITY: ProServer can be downloaded from http://www.sanger.ac.uk/proserver/ or CPAN http://search.cpan.org/~rpettett/. Details on the system requirements and installation of ProServer can be found at http://www.sanger.ac.uk/proserver/.  相似文献   

14.
SUMMARY: Visual programming offers an intuitive means of combining known analysis and visualization methods into powerful applications. The system presented here enables users who are not programmers to manage microarray and genomic data flow and to customize their analyses by combining common data analysis tools to fit their needs. AVAILABILITY: http://www.ailab.si/supp/bi-visprog SUPPLEMENTARY INFORMATION: http://www.ailab.si/supp/bi-visprog.  相似文献   

15.
The R package mosclust (model order selection for clustering problems) implements algorithms based on the concept of stability for discovering significant structures in bio-molecular data. The software library provides stability indices obtained through different data perturbations methods (resampling, random projections, noise injection), as well as statistical tests to assess the significance of multi-level structures singled out from the data. Availability: http://homes.dsi.unimi.it/~valenti/SW/mosclust/download/mosclust_1.0.tar.gz. Supplementary information: http://homes.dsi.unimi.it/~valenti/SW/mosclust.  相似文献   

16.
During the past decade, molecular techniques have provided a wealth of data that have facilitated the resolution of several controversial questions in polyploid evolution. Herein we have focused on several of these issues: (1) the frequency of recurrent formation of polyploid species; (2) the genetic consequences of multiple polyploidizations within a species; (3) the prevalence and genetic attributes of autopolyploids; and (4) the genetic changes that occur in polyploid genomes following their formation.

Molecular data provide a more dynamic picture of polyploid evolution than has been traditionally espoused. Numerous studies have demonstrated multiple origins of both allopolyploids and autopolyploids. In several polyploid species studied in detail, multiple origins were found to be frequent on a local geographic scale, as well as during a short span of time. Molecular data strongly suggest that recurrent formation of polyploid species is the rule, rather than the exception. In addition, molecular data indicate that recurrent formation of polyploids has important genetic consequences, introducing considerable genetic variation from diploid progenitors into polyploid derivatives.

Molecular data also suggest a much more important role for natural autopolyploids than has been historically envisioned. In contrast to the longstanding view of autopolyploidy as being rare, molecular data continue to reveal steadily increasing numbers of well-documented autoploids having tetrasomic or higher-level polysomic inheritance. Although autopolyploidy undoubtedly occurs much less frequently than allopolyploidy in natural populations, it nonetheless has been a significant evolutionary mechanism. Molecular data also provide compelling genetic evidence that contradicts the traditional view of autopolyploidy as being maladaptive. Electrophoretic studies have revealed three important attributes of autopolyploids compared to their diploid progenitors: (1) enzyme multiplicity, (2) increased heterozygosity, and (3) increased allelic diversity. Genetic variability is, in fact, typically substantially higher in autopoloids than in their diploid progenitors. These genetic attributes of autopolyploids are due to polysomic inheritance and provide strong genetic arguments for the potential success of autopolyploids in nature.

In addition to providing numerous important insights into the formation of polyploids and the immediate genetic consequences of polyploidy, molecular data also have been used to study the subsequent evolution of polyploid genomes. Common hypotheses on the subsequent evolution of polyploid genomes include (1) gene silencing, eventually leading to extensively diploidized polyploid genomes; (2) gene diversification, resulting in regulatory or functional divergence of duplicate genes; and (3) genome diversification, resulting in chromosomal repatterning. Compelling, but limited, genetic evidence for all of these factors has been obtained in molecular analyses of polyploid species. The occurrence of these processes in polyploid genomes indicates that polyploid genomes are plastic and susceptible to evolutionary change.

In summary, molecular data continue to demonstrate that polyploidization and the subsequent evolution of polyploid genomes are very dynamic processes.  相似文献   


17.
18.
MOTIVATION: BioPAX is a standard language for representing and exchanging models of biological processes at the molecular and cellular levels. It is widely used by different pathway databases and genomics data analysis software. Currently, the primary source of BioPAX data is direct exports from the curated pathway databases. It is still uncommon for wet-lab biologists to share and exchange pathway knowledge using BioPAX. Instead, pathways are usually represented as informal diagrams in the literature. In order to encourage formal representation of pathways, we describe a software package that allows users to create pathway diagrams using CellDesigner, a user-friendly graphical pathway-editing tool and save the pathway data in BioPAX Level 3 format. AVAILABILITY: The plug-in is freely available and can be downloaded at ftp://ftp.pantherdb.org/CellDesigner/plugins/BioPAX/ CONTACT: huaiyumi@usc.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

19.
20.
The genetic effective population size, Ne, can be estimated from the average gametic disequilibrium () between pairs of loci, but such estimates require evaluation of assumptions and currently have few methods to estimate confidence intervals. speed‐ne is a suite of matlab computer code functions to estimate from with a graphical user interface and a rich set of outputs that aid in understanding data patterns and comparing multiple estimators. speed‐ne includes functions to either generate or input simulated genotype data to facilitate comparative studies of estimators under various population genetic scenarios. speed‐ne was validated with data simulated under both time‐forward and time‐backward coalescent models of genetic drift. Three classes of estimators were compared with simulated data to examine several general questions: what are the impacts of microsatellite null alleles on , how should missing data be treated, and does disequilibrium contributed by reduced recombination among some loci in a sample impact . Estimators differed greatly in precision in the scenarios examined, and a widely employed estimator exhibited the largest variances among replicate data sets. speed‐ne implements several jackknife approaches to estimate confidence intervals, and simulated data showed that jackknifing over loci and jackknifing over individuals provided ~95% confidence interval coverage for some estimators and should be useful for empirical studies. speed‐ne provides an open‐source extensible tool for estimation of from empirical genotype data and to conduct simulations of both microsatellite and single nucleotide polymorphism (SNP) data types to develop expectations and to compare estimators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号