期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Relating destabilizing regions to known functional sites in proteins

Benoît H Dessailly Marc F Lensink Shoshana J Wodak 《BMC bioinformatics》2007,8(1):141

Background

Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. 相似文献

2.

A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods

Richard D Pearson 《BMC bioinformatics》2008,9(1):164

Background

The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws. 相似文献

3.

A benchmark for statistical microarray data analysis that preserves actual biological and technical variance

Benoît De Hertogh Bertrand De Meulder Fabrice Berger Michael Pierre Eric Bareke Anthoula Gaigneaux Eric Depiereux 《BMC bioinformatics》2010,11(1):17

Background

Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. 相似文献

4.

SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms

Tim Van den Bulcke Koenraad Van Leemput Bart Naudts Piet van Remortel Hongwu Ma Alain Verschoren Bart De Moor Kathleen Marchal 《BMC bioinformatics》2006,7(1):43

Background

The development of algorithms to infer the structure of gene regulatory networks based on expression data is an important subject in bioinformatics research. Validation of these algorithms requires benchmark data sets for which the underlying network is known. Since experimental data sets of the appropriate size and design are usually not available, there is a clear need to generate well-characterized synthetic data sets that allow thorough testing of learning algorithms in a fast and reproducible manner. 相似文献

5.

Complex phylogenetic distribution of a non-canonical genetic code in green algae

Ellen Cocquyt Gillian H Gile Frederik Leliaert Heroen Verbruggen Patrick J Keeling Olivier De Clerck 《BMC evolutionary biology》2010,10(1):327

Background

A non-canonical nuclear genetic code, in which TAG and TAA have been reassigned from stop codons to glutamine, has evolved independently in several eukaryotic lineages, including the ulvophycean green algal orders Dasycladales and Cladophorales. To study the phylogenetic distribution of the standard and non-canonical genetic codes, we generated sequence data of a representative set of ulvophycean green algae and used a robust green algal phylogeny to evaluate different evolutionary scenarios that may account for the origin of the non-canonical code. 相似文献

6.

Differences in evolutionary pressure acting within highly conserved ortholog groups

Teresa M Przytycka Raja Jothi L Aravind David J Lipman 《BMC evolutionary biology》2008,8(1):208

Background

In highly conserved widely distributed ortholog groups, the main evolutionary force is assumed to be purifying selection that enforces sequence conservation, with most divergence occurring by accumulation of neutral substitutions. Using a set of ortholog groups from prokaryotes, with a single representative in each studied organism, we asked the question if this evolutionary pressure is acting similarly on different subgroups of orthologs defined as major lineages (e.g. Proteobacteria or Firmicutes). 相似文献

7.

Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions

Yohan Kim John Sidney S?ren Buus Alessandro Sette Morten Nielsen Bjoern Peters 《BMC bioinformatics》2014,15(1)

Background

It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set.

Results

We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates.

Conclusion

It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-241) contains supplementary material, which is available to authorized users. 相似文献

8.

Evolution of metabolic network organization 总被引：2，自引：0，他引：2

Aurélien Mazurie Danail Bonchev Benno Schwikowski Gregory A Buck 《BMC systems biology》2010,4(1):59

相似文献

9.

Clustering metagenomic sequences with interpolated Markov models

David R Kelley Steven L Salzberg 《BMC bioinformatics》2010,11(1):544

Background

Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. 相似文献

10.

Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

Niclas C Tan Wayne G Fisher Kevin P Rosenblatt Harold R Garner 《BMC bioinformatics》2009,10(1):144

Background

Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. 相似文献

11.

Fernando Garcia Francisco J Lopez Carlos Cano Armando Blanco 《BMC bioinformatics》2009,10(1):224

相似文献

12.

Discriminating between rival biochemical network models: three approaches to optimal experiment design

Bence Mélykúti Antonis Papachristodoulou Hana El-Samad 《BMC systems biology》2010,4(1):38

Background

The success of molecular systems biology hinges on the ability to use computational models to design predictive experiments, and ultimately unravel underlying biological mechanisms. A problem commonly encountered in the computational modelling of biological networks is that alternative, structurally different models of similar complexity fit a set of experimental data equally well. In this case, more than one molecular mechanism can explain available data. In order to rule out the incorrect mechanisms, one needs to invalidate incorrect models. At this point, new experiments maximizing the difference between the measured values of alternative models should be proposed and conducted. Such experiments should be optimally designed to produce data that are most likely to invalidate incorrect model structures. 相似文献

13.

Analysis of superfamily specific profile-profile recognition accuracy

James?A?Casbon Mansoor?AS?Saqi Email author 《BMC bioinformatics》2004,5(1):200

Background

Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily. 相似文献

14.

Gene set analysis exploiting the topology of a pathway

Maria Sofia Massa Monica Chiogna Chiara Romualdi 《BMC systems biology》2010,4(1):121

Background

Recently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it. 相似文献

15.

Antibody-protein interactions: benchmark datasets and prediction tools evaluation

Julia V Ponomarenko Philip E Bourne 《BMC structural biology》2007,7(1):64

Background

The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. 相似文献

16.

LCA of land-based freight transportation: facilitating practical application and including accidents in LCIA

Nikolaus Fries Stefanie Hellweg 《The International Journal of Life Cycle Assessment》2014,19(3):546-557

Purpose

A major task concerning the greening of freight transportation is to influence the process of choosing an appropriate transport solution for a shipment. This paper presents the results of a detailed environmental benchmark study of freight transport chains recorded during a shipper survey administered in Switzerland in 2008.

Materials and methods

For the environmental evaluation, life cycle assessment was applied and enhanced with a new method for integrating damage to human health caused by traffic accidents based on the disability adjusted life year concept.

Results and discussion

The results show that in land-based transport, road generally has a lower environmental performance compared to intermodal and rail-only transport. Exceptions exist, e.g. for long pre- and post-haulage distances in intermodal transport or for very low train-load factors. The most relevant environmental interventions to pay attention to are, according to the methods applied, emissions of CO₂, NO_x and particulates as well as accident damages.

Conclusions

Rail transport is often, but not always, environmentally preferable than truck transport. Accident damages to human health should be included in each benchmark study. For practical application, a simplified benchmark methodology is proposed requiring a reduced level of detail for the input data. 相似文献

17.

PAGE: Parametric Analysis of Gene Set Enrichment

Seon-Young?Kim Email author David?J?Volsky Email author 《BMC bioinformatics》2005,6(1):144

Background

Gene set enrichment analysis (GSEA) is a microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. GSEA is especially useful when gene expression changes in a given microarray data set is minimal or moderate. 相似文献

18.

How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results

Frank F Millenaar John Okyere Sean T May Martijn van Zanten Laurentius ACJ Voesenek Anton JM Peeters 《BMC bioinformatics》2006,7(1):137-16

相似文献

19.

iRefIndex: A consolidated protein interaction database with provenance

Sabry Razick George Magklaras Ian M Donaldson 《BMC bioinformatics》2008,9(1):405

Background

Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping. 相似文献

20.

GONOME: measuring correlations between GO terms and genomic positions

Stefan M Stanley Timothy L Bailey John S Mattick 《BMC bioinformatics》2006,7(1):94

相似文献