首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Stiglic G  Kocbek S  Pernek I  Kokol P 《PloS one》2012,7(3):e33812

Purpose

Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible.

Methods

This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree.

Results

The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree.

Conclusions

The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.  相似文献   

2.
Nguyen Quoc Khanh Le 《Proteomics》2023,23(23-24):2300011
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.  相似文献   

3.
从信息处理的角度来看,生物信息学与自然语言处理中的许多问题是非常相似的,因此,可以将一些自然语言处理中的经典方法应用到生物信息学文字中。本文介绍了自然语言处理和生物信息学中共有的问题,如比对、分类、预测等,以及这些问题的解决方法。通过对两个领域形似问题的分析可知,优秀的自然语言处理技术也可用来解决生物信息学方面的问题,并且一些还未在生物信息学领域得到应用的自然语言理解技术也有其潜在的应用价值。最后给出了一个分类问题的解决方案,演示了如何在生物数据上应用算法进行实验。  相似文献   

4.
Challenges in bioinformatics: infrastructure, models and analytics   总被引:3,自引:0,他引:3  
  相似文献   

5.
6.
Single-point mutations are one of the most frequent causes of genetic variability in both human and close species. The recent availability of different bioinformatics tools for annotating human single nucleotide polymorphisms (SNPs) has opened the possibility of using them to score SNPs from species with a biomedical interest, in particular from mice and other models of human disease. Also, this ability to predict pathogenicity of single point mutations in one species, based on data from another species, opens the possibility to predict the pathological character of single point mutations in humans using data from well-characterized model systems of human disease. This could provide a valuable alternative to the more traditional genetic population approaches. However, transferral of prediction tools may be limited by different factors, from a species bias in the training set, to a large sequence divergence between the proteomes of the training and the target species. Here we study the conditions under which prediction tools can be transferred among species, concentrating in the case of mice. We find that for the majority of the human-mouse homolog pairs, the sequence similarity is large enough to preserve the pathological character of mutations among species, in general. We then establish that prediction/annotation tools developed for one organism can be used to predict the neutral/pathological character of mutations/SNPs in the other organism.  相似文献   

7.
Hu X  Holland EC 《Mutation research》2005,576(1-2):54-65
Gliomas are the most common primary tumors that arise from glial cells and their precursors in the central nervous system. Most of the genetic alterations identified in human gliomas result in signal transduction abnormalities or disruption of cell cycle arrest pathways. Over the past years, several mouse glioma models have been generated based on human genetic abnormalities and the induced gliomas exhibit histological similarities to their human counterparts. There is emerging evidence suggesting that an oncogenic signaling initiating tumorigenesis is also required for tumor maintenance, these glioma models can be used to further characterize the mechanisms of oncogenic signaling in tumor formation, as well as identify molecular targets in preclinical trials.  相似文献   

8.
The average or amorphous track model uses the response of a system to gamma-rays and the radial distribution of dose about an ion’s path to describe survival and other cellular endpoints from proton, heavy ion, and neutron irradiation. This model has been used for over 30 years to successfully fit many radiobiology data sets. We review several extensions of this approach that address objections to the original model, and consider applications of interest in radiobiology and space radiation risk assessment. In the light of present views of important cellular targets, the role of target size as manifested through the relative contributions from ion-kill (intra-track) and gamma-kill (inter-track) remains a critical question in understanding the success of the amorphous track model. Several variations of the amorphous model are discussed, including ones that consider the radial distribution of event-sizes rather than average electron dose, damage clusters rather than multiple targets, and a role for repair or damage processing. Received: 30 October 1998 / Accepted in revised form: 6 April 1999  相似文献   

9.
The three-dimensional structure of rice dwarf virus was determined to 6.8 A resolution by single particle electron cryomicroscopy. By integrating the structural analysis with bioinformatics, the folds of the proteins in the double-shelled capsid were derived. In the outer shell protein, the uniquely orientated upper and lower domains are composed of similar secondary structure elements but have different relative orientations from that of bluetongue virus in the same Reoviridae family. Differences in both sequence and structure between these proteins may be important in defining virus-host interactions. The inner shell protein adopts a conformation similar to other members of Reoviridae, suggesting a common ancestor that has evolved to infect hosts ranging from plants to animals. Symmetry mismatch between the two shells results in nonequivalent, yet specific, interactions that contribute to the stability of this large macromolecular machine.  相似文献   

10.
软计算在生态模型中的应用   总被引:1,自引:0,他引:1  
陈求稳  Arthur Mynett  王菲 《生态学报》2006,26(8):2594-2601
由于生态系统的高度复杂性和非线性以及空间数据采集技术的快速发展,近年来越来越多的软计算方法开始应用到生态模拟中来。软计算是个非常广泛的领域,在模式上主要包括元胞自动机、基于个体和盒式模式等;在方法上代表性的有人工神经网络、模糊数学、遗传算法、混沌理论等。重点介绍元胞自动机和规律方法在生态模型中的应用,具体实例包括种群动态模拟、水华预警和生境栖息地模拟。  相似文献   

11.
12.
One of the challenges to the effective utilization of cDNA microarray analysis in mouse models of oncogenesis is the choice of a critical set of probes that are informative for human disease. Given the thousands of genes with a potential role in human oncogenesis and the hundreds of thousands of mouse sequences available for use as probes, selection of an informative set of mouse probes can be an overwhelming task. We have developed a web based sequence mining tool using DataBase Independent (DBI) Perl to annotate publicly available sequences. The Mouse Oncochip Design Tool uses the Mouse Genome Database (MGD) developed and maintained by the Jackson Laboratories for mouse DNA sequences. There are over 380 000 sequences in their database. The output list has been ordered to present the genes more likely to be informative in a mouse model of human cancer using a candidate set of oncogenes to order the list. Mouse sequences that represent genes that are homologous with a member of a human oncogene set are listed first. In addition it provides a set of links for information on clone source gene function. Contact: http://nciarray.nci.nih.gov/cgi-bin/me/mouse_design.cgi  相似文献   

13.
The current article illustrates the practical advantages of some new models and statistical algorithms for codon substitution and spatial rate variation in molecular phylogeny. Our companion paper in this issue discusses at length the mathematical properties of these models for nucleotide and codon substitution, for site-to-site and branch-to-branch heterogeneity in rates of evolution, and for spatial correlation in the assignment of rates. In this study we summarize the theoretical background and apply the models and algorithms to data on beta-globin, the complete HIV genome, and the mitochondrial genome. Our complex but realistic models enhance biological interpretation of sequence data and show substantial improvements in model fit over existing models. All the new statistical algorithms applied are incorporated in our phylogeny software LINNAEUS, which is tuned for performance and modeling flexibility.  相似文献   

14.
Neuroimaging techniques represent powerful tools to assess disease-specific cellular, biochemical and molecular processes non-invasively in vivo. Besides providing precise anatomical localisation and quantification, the most exciting advantage of non-invasive imaging techniques is the opportunity to investigate the spatial and temporal dynamics of disease-specific functional and molecular events longitudinally in intact living organisms, so called molecular imaging (MI). Combining neuroimaging technologies with in vivo models of neurological disorders provides unique opportunities to understand the aetiology and pathophysiology of human neurological disorders. In this way, neuroimaging in mouse models of neurological disorders not only can be used for phenotyping specific diseases and monitoring disease progression but also plays an essential role in the development and evaluation of disease-specific treatment approaches. In this way MI is a key technology in translational research, helping to design improved disease models as well as experimental treatment protocols that may afterwards be implemented into clinical routine. The most widely used imaging modalities in animal models to assess in vivo anatomical, functional and molecular events are positron emission tomography (PET), magnetic resonance imaging (MRI) and optical imaging (OI). Here, we review the application of neuroimaging in mouse models of neurodegeneration (Parkinson's disease, PD, and Alzheimer's disease, AD) and brain cancer (glioma).  相似文献   

15.
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.  相似文献   

16.
肠道病毒71型(enterovirus 71,EV71)是一种被忽视的热带传染病——手足口病的主要病原体之一,过去15年在亚太地区引起了多次手足口病暴发。由于脊髓灰质炎病毒的有效控制,EV71已成为最重要的嗜神经肠道病毒,其严重的神经系统并发症威胁着儿童健康。合适的动物模型可帮助更好地了解EV71神经致病机制,并有利于开发有效的疫苗和治疗药物。本文就EV71已建立的3类主要动物模型(非人灵长类动物模型、小鼠适应性模型及转基因小鼠模型)的特征、应用与局限进行综述。  相似文献   

17.
Current trends in bioinformatics   总被引:4,自引:0,他引:4  
  相似文献   

18.
随着深度测序和基因芯片技术的不断发展,基因组、转录组、表达谱数据大量积累。目前,至少有10多个昆虫的基因组已被测序,30多个昆虫的转录组数据被报道。显然,传统的生物统计学方法无法处理如此海量的生物数据。量变引发质变,生物数据的大量积累催生了一门新兴学科,生物信息学。生物信息学融合了统计学、信息科学和生物学等各学科的理论和研究内容,在医学、基础生物学、农业科学以及昆虫学等方面获得了广泛的应用。生物信息学的目标是存储数据、管理数据和数据挖掘。因此,建立维护生物学数据库、设计开发基于模式识别、机器学习、数据挖掘等方法的生物软件,以及运用上述工具进行深度的数据挖掘,是生物信息学的重要研究内容。本文首先简要介绍了生物信息学的历史、研究现状及其在昆虫学科中的应用,然后综述了昆虫基因组学和转录组学的研究进展,最后对生物信息学在昆虫学研究中的应用前景进行了展望。  相似文献   

19.
Bioinformatics is the name that has become associated with the theoretical and applied field of study that links computer science with modern biology. Within molecular biology specifically, bioinformatics is a generic term used to describe many of the analytical manipulations that can be carried out on sequences. Familiarity with the resources available and fundamental methods used for such analyses should be an essential part of a modern biology course, especially given the availability of WWW resources. In this article, some of these resources are summarised and their possible integration into a short practical undergraduate teaching unit is described.  相似文献   

20.
The third Heidelberg Unseminars in Bioinformatics (HUB) was held on 18th October 2012, at Heidelberg University, Germany. HUB brought together around 40 bioinformaticians from academia and industry to discuss the ‘Biggest Challenges in Bioinformatics’ in a ‘World Café’ style event.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号