期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Integrating bioinformatics approaches for a comprehensive interpretation of metabolomics datasets

《Current opinion in biotechnology》2018

Download : Download high-res image (169KB)
Download : Download full-size image

相似文献

2.

PLPD: reliable protein localization prediction from imbalanced and overlapped datasets

Lee K Kim DW Na D Lee KH Lee D 《Nucleic acids research》2006,34(17):4655-4666

Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003). 相似文献

3.

A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

Longlong?Liao Kenli?Li Email author Keqin?Li Canqun?Yang Qi?Tian 《BMC systems biology》2018,12(6):111

Background

While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience.

Results

The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results.

Conclusions

Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.

相似文献

4.

Analysis of RNAseq datasets from a comparative infectious disease zebrafish model using GeneTiles bioinformatics

Wouter J. Veneman Jan de Sonneville Kees-Jan van der Kolk Anita Ordas Zaid Al-Ars Annemarie H. Meijer Herman P. Spaink 《Immunogenetics》2015,67(3):135-147

相似文献

5.

PathEx: a novel multi factors based datasets selector web tool

Eric Bareke Michael Pierre Anthoula Gaigneaux Bertrand De Meulder Sophie Depiereux Naji Habra Eric Depiereux 《BMC bioinformatics》2010,11(1):528

Background

Microarray experiments have become very popular in life science research. However, if such experiments are only considered independently, the possibilities for analysis and interpretation of many life science phenomena are reduced. The accumulation of publicly available data provides biomedical researchers with a valuable opportunity to either discover new phenomena or improve the interpretation and validation of other phenomena that partially understood or well known. This can only be achieved by intelligently exploiting this rich mine of information. 相似文献

6.

Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement

Francisco R Pinto João A Carriço Mário Ramirez Jonas S Almeida 《BMC bioinformatics》2007,8(1):44

Background

Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. 相似文献

7.

A novel deep learning method for maize disease identification based on small sample-size and complex background datasets

《Ecological Informatics》2023

Maize diseases are a major source of yield loss, but due to the lack of human experience and limitations of traditional image-recognition technology, obtaining satisfactory large-scale identification results of maize diseases are difficult. Fortunately, the advancement of deep learning-based technology makes it possible to automatically identify diseases. However, it still faces issues caused by small sample sizes and complex field background, which affect the accuracy of disease identification. To address these issues, a deep learning-based method was proposed for maize disease identification in this paper. DenseNet121 was used as the main extraction network and a multi-dilated-CBAM-DenseNet (MDCDenseNet) model was built by combining the multi-dilated module and convolutional block attention module (CBAM) attention mechanism. Five models of MDCDenseNet, DenseNet121, ResNet50, MobileNetV2, and NASNetMobile were compared and tested using three kinds of maize leave images from the PlantVillage dataset and field-collected at Northeast Agricultural University in China. Furthermore, auxiliary classifier generative adversarial network (ACGAN) and transfer learning were used to expand the dataset and pre-train for optimal identification results. When tested on field-collected datasets with a complex background, the MDCDenseNet model outperformed compared to these models with an accuracy of 98.84%. Therefore, it can provide a viable reference for the identification of maize leaf diseases collected from the farmland with a small sample size and complex background. 相似文献

8.

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Käll L Canterbury JD Weston J Noble WS MacCoss MJ 《Nature methods》2007,4(11):923-925

Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach. 相似文献

9.

GTI: a novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets

Mpindi JP Sara H Haapa-Paananen S Kilpinen S Pisto T Bucher E Ojala K Iljin K Vainio P Björkman M Gupta S Kohonen P Nees M Kallioniemi O 《PloS one》2011,6(2):e17259

Background

Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type (‘outlier genes’), a hallmark of potential oncogenes.

Methodology

A new statistical method (the gene tissue index, GTI) was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29) of these genes, and 17 of these 19 genes (90%) showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target.

Conclusions/Significance

Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is implemented in an R package (Text S1). 相似文献

10.

PIR: a new resource for bioinformatics 总被引：3，自引：0，他引：3

McGarvey PB Huang H Barker WC Orcutt BC Garavelli JS Srinivasarao GY Yeh LS Xiao C Wu CH 《Bioinformatics (Oxford, England)》2000,16(3):290-291

SUMMARY: The Protein Information Resource (PIR) has greatly expanded its Web site and developed a set of interactive search and analysis tools to facilitate the analysis, annotation, and functional identification of proteins. New search engines have been implemented to combine sequence similarity search results with database annotation information. The new PIR search systems have proved very useful in providing enriched functional annotation of protein sequences, determining protein superfamily-domain relationships, and detecting annotation errors in genomic database archives. AVAILABILITY: http://pir.georgetown.edu/. CONTACT: mcgarvey@nbrf.georgetown.edu 相似文献

11.

An integrated approach based on the correction of imbalanced small datasets and the application of machine learning algorithms to predict total phosphorus concentration in rivers

《Ecological Informatics》2023

Increased concentrations of Total Phosphorus (TP) in freshwater systems lead to eutrophication and can contribute to a wide range of environmental effects. In the modern era, water quality models have increasingly been used globally for the development of management scenarios with the aim of reducing the eutrophication risk. However, the accuracy of these models is limited by the quality of the boundary conditions forcing data, namely TP concentration datasets. In this study, a novel methodology is proposed to improve machine learning prediction accuracy in the modeling of river TP concentration forced with small input training datasets. These models can then be used to increase the quality and consistency of the TP concentration datasets required to force water quality models. This new methodology relies on the generation of 100 new training datasets from the raw training datasets of input predictors through the implementation of an over/undersampling technique. The modeling approach used in this study was supported by the application of ten machine learning algorithms to estimate the TP concentration values in 22 rivers located in Portugal. The modeling approach also included an input feature importance evaluation, as well as model hyperparameter optimization. In general terms, the Extreme Gradient Boosting (XGBoost) and Support Vector Regressor (SVR) models performed best overall, with the ensemble results recorded for both models working to increase the mean Nash-Sutcliffe efficiency (NSE) across all the areas being studied by 96% (0.01 ± 0.22 to 0.31 ± 0.32) and reduce the mean percentage bias (PBIAS) by 43% (18.47 ± 17.31 to 10.60 ± 17.40). The results of this study suggest that the solution proposed has the potential to significantly improve the modeling of TP concentration in rivers with machine learning methods, as well as providing increased scope for its application to larger training datasets and the prediction of other types of dependent variables. Hopefully, the results of this study will further add to the body of information available in this area of research and aid the development of the water management process. 相似文献

12.

Geoseq: a tool for dissecting deep-sequencing datasets

James Gurtowski Anthony Cancio Hardik Shah Chaya Levovitz Ajish George Robert Homann Ravi Sachidanandam 《BMC bioinformatics》2010,11(1):506

Background

Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. 相似文献

13.

LV-GAN: A deep learning approach for limited-view optoacoustic imaging based on hybrid datasets

Tong Lu Tingting Chen Feng Gao Biao Sun Vasilis Ntziachristos Jiao Li 《Journal of biophotonics》2021,14(2):e202000325

The optoacoustic imaging (OAI) methods are rapidly evolving for resolving optical contrast in medical imaging applications. In practice, measurement strategies are commonly implemented under limited-view conditions due to oversized image objectives or system design limitations. Data acquired by limited-view detection may impart artifacts and distortions in reconstructed optoacoustic (OA) images. We propose a hybrid data-driven deep learning approach based on generative adversarial network (GAN), termed as LV-GAN, to efficiently recover high quality images from limited-view OA images. Trained on both simulation and experiment data, LV-GAN is found capable of achieving high recovery accuracy even under limited detection angles less than 60^°. The feasibility of LV-GAN for artifact removal in biological applications was validated by ex vivo experiments based on two different OAI systems, suggesting high potential of a ubiquitous use of LV-GAN to optimize image quality or system design for different scanners and application scenarios. 相似文献

14.

In vivo modulation of brain cholesterol level and learning performance by a novel plant lipid: indications for interactions between hippocampal-cortical cholesterol and learning 总被引：1，自引：0，他引：1

A R Kessler B Kessler S Yehuda 《Life sciences》1986,38(13):1185-1192

In this account we report in vivo effects of a plant lipid preparation (MMPL) on brain cholesterol and the activity and learning performance of aging male rats. Three-month-old rats were fed for 3 months with a diet that was enriched with 3% MMPL. Another group of 18 month-old rats was fed for 6 months with a 3% MMPL-enriched diet. This food regime lowered markedly the cholesterol level in the hippocampal and cortical regions and increased their lipid membrane fluidity. The animals of both age groups also responded to MMPL with a higher activity and their learning performances, compared to normal diet-fed animals, improved notably. This improvement continued at least 4 months after terminating the supply of MMPL. Significant inverse correlationships were obtained between the length of the training period required to attain proper criteria and cholesterol levels of the hippocampal and cortical brain fractions. 相似文献

15.

APDB: a novel measure for benchmarking sequence alignment methods without reference alignments

O'Sullivan O Zehnder M Higgins D Bucher P Grosdidier A Notredame C 《Bioinformatics (Oxford, England)》2003,19(Z1):i215-i221

MOTIVATION: We describe APDB, a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. This evaluation does not require a reference alignment or a structure superposition. APDB is designed to efficiently and objectively benchmark multiple sequence alignment methods. RESULTS: Using existing collections of reference multiple sequence alignments and existing alignment methods, we show that APDB gives results that are consistent with those obtained using conventional evaluations. We also show that APDB is suitable for evaluating sequence alignments that are structurally equivalent. We conclude that APDB provides an alternative to more conventional methods used for benchmarking sequence alignment packages. 相似文献

16.

NMRb: a web-site repository for raw NMR datasets

Pons JL Malliavin TE Tramesel D Delsuc MA 《Bioinformatics (Oxford, England)》2004,20(18):3707-3709

SUMMARY: The development of NMR in structural proteomics requires the availability of automatic structure determination methods. Many researchers are commonly confronted with the lack of raw datasets during the validation step of such methods. In order to increase test possibilities, the NMRb web-site offers a database of NMR raw datasets, ordered by spectral characteristics. AVAILABILITY: NMRb is available from: http://nmrb.cbs.cnrs.fr. SUPPLEMENTARY INFORMATION: General organization of NMRb figure, relational model organization, and XML structure files are available from http://nmrb.cbs.cnrs.fr/nmrb-doc.html. 相似文献

17.

Hormone-mediated gene regulation and bioinformatics: learning one from the other

Sousa JC Costa MJ Palha JA 《PloS one》2007,2(5):e481

The ability to manage the constantly growing clinically relevant information in genetics available on the internet is becoming crucial in medical practice. Therefore, training students in teaching environments that develop bioinformatics skills is a particular challenge to medical schools. We present here an instructional approach that potentiates learning of hormone/vitamin mechanisms of action in gene regulation with the acquisition and practice of bioinformatics skills. The activity is integrated within the study of the Endocrine System module. Given a nucleotide sequence of a hormone or vitamin-response element, students use internet databases and tools to find the gene to which it belongs. Subsequently, students search how the corresponding hormone/vitamin influences the expression of that particular gene and how a dysfunctional interaction might cause disease. This activity was presented for four consecutive years to cohorts of 50-60 students/year enrolled in the 2(nd) year of the medical degree. 90% of the students developed a better understanding of the usefulness of bioinformatics and 98% intend to use web-based resources in the future. Since hormones and vitamins regulate genes of all body organ systems, this activity successfully integrates the whole body physiology of the medical curriculum. 相似文献

18.

Performance measures in evaluating machine learning based bioinformatics predictors for classifications

Yasen Jiao Pufeng Du 《Quantitative Biology.》2016,4(4):320

Background: Many existing bioinformatics predictors are based on machine learning technology. When applying these predictors in practical studies, their predictive performances should be well understood. Different performance measures are applied in various studies as well as different evaluation methods. Even for the same performance measure, different terms, nomenclatures or notations may appear in different context. Results: We carried out a review on the most commonly used performance measures and the evaluation methods for bioinformatics predictors. Conclusions: It is important in bioinformatics to correctly understand and interpret the performance, as it is the key to rigorously compare performances of different predictors and to choose the right predictor. 相似文献

19.

Post-translational modifications: a challenge for proteomics and bioinformatics

Appel RD Bairoch A 《Proteomics》2004,4(6):1525-1526

相似文献

20.

Biowep: a workflow enactment portal for bioinformatics applications

Romano P Bartocci E Bertolini G De Paoli F Marra D Mauri G Merelli E Milanesi L 《BMC bioinformatics》2007,8(Z1):S19

相似文献