首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
分析了数据库技术在医学信息处理中的现状及问题,介绍了数据仓库及在此基础上产生的数据挖掘技术,重点介绍了如何利用现有的医学信息资源建立基于数据仓库模型的医学信息数据库,并运用数据挖掘技术抽取数据库中数据隐藏的规律,提高医学信息的利用率.  相似文献   

2.
基于医学信息数据仓库模型的数据挖掘   总被引:2,自引:0,他引:2  
利用数据仓库和数据挖掘技术,以现有医院信息系统HIS及医学信息资源为基础,基于PC和Windows操作系统,利用SQL Server2005及SQL Server 2005 Analysis Services(SSAS)等软件,搭建了医学信息数据仓库模型,并运用数据挖掘技术抽取数据库中数据隐藏的规律,提高医学信息的利用率。为从错综复杂的、庞大的医学信息库中提取有价值的决策支持信息提供有效的途径和方法。  相似文献   

3.
基于c-均值聚类的粗糙集神经网络在心脏病诊断中的应用   总被引:1,自引:0,他引:1  
采用c-均值聚类法将决策表中的连续条件属性进行离散化,用粗糙集处理离散化后的决策表系统得到简化规则,然后将规则集输入BP神经网络进行训练,并对测试集进行预测.以此模型对一组有关心脏病诊断的数据进行处理,得到的预测判准率达85%,而单独使用粗糙集或BPNN进行预测,则判准率分别为76%和82%;若在粗糙集和BPNN联用模型中,对原始数据采用传统的等距离离散化和等频率离散化等离散化方法,预测判准率则分别只有53%和77%.  相似文献   

4.
随着高通量测序技术的飞速发展,植物基因组学研究目前已经积累了海量多组学数据。因此如何开发和改进相关处理软件工具,从而有效利用这些海量数据发掘有用的生物学信息,成为当下亟需解决的重要科学问题。其中机器学习方法凭借其显著的预测、分类、数据挖掘和集成能力,在此领域受到广泛关注。本文系统综述了不同类型机器学习方法的基本原理和流程,以及这些方法在植物基因组功能预测中的研究进展,重点总结了机器学习模型在植物分子相互作用预测、重要功能位点预测、功能注释、作物育种等方面的应用成果,并展望了该领域未来的发展方向和应用前景。本文有助于植物研究者快速了解和应用机器学习方法,从而推进植物遗传相关机制的研究和作物性状改良。  相似文献   

5.
数据挖掘在生物信息学中的应用   总被引:6,自引:0,他引:6  
借助各种应用数学和计算机技术 ,将大量积累并急需处理的生物信息数据利用起来 ,探索生物信息中的规律 ,是当前国内国际生物信息学研究的热点和重点。其中数据挖掘技术在生物信息研究中发挥着巨大的作用。  相似文献   

6.
张晓慧 《蛇志》2014,(1):95-96
<正>随着计算机技术的发展,医学信息系统已经无处不在,大量原始数据也相应产生,医学跨入数据密集型领域[1]。原始数据的价值在于其中可隐藏着一些未知的知识,可给人们的决策提供更多的帮助。计算机技术、医学影像技术参与现代医学疾病诊断的广泛应用,使其在医学诊断中占据了一个不可或缺的地位。近年来,数据挖掘是计算机信息技术领域兴起的一个研究领域,且在医学、生产经营、农业、国防、科研、流通和金融等各领域都有不同程度的成功应用[2]。对近  相似文献   

7.
利用基因芯片技术检测P53基因突变   总被引:8,自引:0,他引:8  
基因芯片技术是后基因组时代基因功能分析的最重要技术之一。利用基因芯片技术检测P53基因突变,具有快速、准确、高通量和自动化的特点。本文阐述了基因芯片技术的基本原理及其检测P53基因突变的方法。  相似文献   

8.
高通量技术的广泛应用使得各类组学数据的产出速度越来越快,由此产生的海量数据蕴藏着大量的基因组变异和相关功能信息。如何对这些数据进行深度整合和利用将会是一个长期而艰巨的任务,这需要具备高效的数据存储、分析和挖掘的能力。在过去几年中,本课题组通过与所内外课题组的合作,在多个植物的基因组的组装、注释、比较基因组和群体基因组分析等方面进行了探索,同时也将大量的水稻种质信息和组学数据进行了整合,存储于结构化数据库中并开发了一些相应的网络查询展示和数据挖掘工具。本文对相关的研究成果及其进展进行了概括性介绍,并展望了下一步的目标:构建一个用于支持作物功能基因组学和分子设计育种研究的整合组学知识库。  相似文献   

9.
基于对QRS波群的特征变量提取。利用减法聚类和自适应模糊神经网络构建心律失常辅助诊断模型,分析不同训练数据集对模型测试结果的影响。实验结果表明。该模型能准确识别不同类型的QRS波群,使用不同训练数据集对诊断结果存在影响,为进一步实现更复杂的心律失常辅助诊断模型提供方法。  相似文献   

10.
左嵩  张雄  刘礼德 《现代生物医学进展》2013,(23):4568-4572,4594
目的:随着各级医院信息化建设的不断加强,医院的信息化水平也日益提高。目前各医院都有自己完善的信息化系统,在日常的门诊中,信息化系统积累了大量的门诊就诊数据,但长久以来这部分数据只是处于低层次的应用。对数据的深层次分析、加工以及对医院管理层的决策支持能力较弱。面对着这些宝贵的数据,医院迫切需要数据挖掘和分析工具从积累的就诊数据中分析出更深层次的、高价值的信息,从而为医院的管理决策提供高价值的决策信息。方法:以聚类算法进行数据挖掘建模,对某院门诊信息资源中有用字段进行挖掘分析。结果:根据数据挖掘模型进行挖掘分析,对有价值字段进行聚类分析,得到相关字段数据挖掘结果。结论:将得到相关字段数据挖掘结果进行分析,并将所分析的结果在医院管理决策和医疗质量管理等方面的应用进行探讨。  相似文献   

11.
Conflict analysis has been used as an important tool in economic, business, governmental and political dispute, games, management negotiations, military operations and etc. There are many mathematical formal models have been proposed to handle conflict situations and one of the most popular is rough set theory. With the ability to handle vagueness from the conflict data set, rough set theory has been successfully used. However, computational time is still an issue when determining the certainty, coverage, and strength of conflict situations. In this paper, we present an alternative approach to handle conflict situations, based on some ideas using soft set theory. The novelty of the proposed approach is that, unlike in rough set theory that uses decision rules, it is based on the concept of co-occurrence of parameters in soft set theory. We illustrate the proposed approach by means of a tutorial example of voting analysis in conflict situations. Furthermore, we elaborate the proposed approach on real world dataset of political conflict in Indonesian Parliament. We show that, the proposed approach achieves lower computational time as compared to rough set theory of up to 3.9%.  相似文献   

12.
Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive problem that encompasses many diverse applications. Decision tree classifier, which is a simple yet widely used classification technique, employs training data to yield decision rules; moreover, it can create thresholds and then split the list of continuous attributes into descrete intervals for handling continuous attributes (Quinlan in Journal of Artificial Intelligence Research 4:77–90, 1996). Rough set theory (Pawlak in International Journal of Computer and Information Sciences 11:341–356, 1982; International Journal of Man-Machine Studies 20:469–483, 1984; Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht, 1991) has been applied to a wide variety of decision analysis problems for the extraction of rules from databases. This paper proposes a hybrid approach that takes advantage of combining decision tree and rough sets classifier and applies it to plant classification. The introduced approach starts with decision tree classifier (C4.5) as preprocessing technique to make interval-discretization, subsequently, and uses rough set method for extracting rules. The proposed approach aims at finding out classification rules via analyzing lamina attributes (leaf stalk, leaf width, leaf length, length/width ratio) of Cinnamomum, which are gathered and measured by plant specialists in the field of Taiwan. A comparison with the widely used algorithms (e.g., decision tree, multilayer perceptrons, naïve Bayes, and rough sets classifier) is carried out to show numerous advantages of the proposed approach. Finally, employing with test data in which species are unknown, results of classification are approved by consulting the relative plant specialists.  相似文献   

13.
Wei LY  Huang CL  Chen CH 《BMC genetics》2005,6(Z1):S133
Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.  相似文献   

14.
Mining gene expression databases for association rules   总被引:16,自引:0,他引:16  
  相似文献   

15.
The work reported in this paper examines the use of principal component analysis (PCA), a technique of multivariate statistics to facilitate the extraction of meaningful diagnostic information from a data set of chromatographic traces. Two data sets mimicking archived production records were analysed using PCA. In the first a full-factorial experimental design approach was used to generate the data. In the second, the chromatograms were generated by adjusting just one of the process variables at a time. Data base mining was achieved through the generation of both gross and disjoint principal component (PC) models. PCA provided easily interpretable 2-dimensional diagnostic plots revealing clusters of chromatograms obtained under similar operating conditions. PCA methods can be used to detect and diagnose changes in process conditions, however results show that a PCA model may require recalibration if an equipment change is made. We conclude that PCA methods may be useful for the diagnosis of subtle deviations from process specification not readily distinguishable to the operator.  相似文献   

16.
Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not provided in a biologically significant way for cross‐analysis and ‐comparison, thus limiting its application. Hence, an integrated software platform that can perform such a complex task is required. Here, we implement set theory for cross‐analyzing gene expression data among different SAGE libraries of tissue sources; up‐ or down‐regulated tissue‐specific tags can be identified computationally. Extract‐SAGE employs a genetic algorithm (GA) to reduce the number of genes among the SAGE libraries. Its representative tag mining will facilitate the discovery of the candidate genes with discriminating gene expression.  相似文献   

17.
We consider the problem of finding the set of rankings that best represents a given group of orderings on the same collection of elements (preference lists). This problem arises from social choice and voting theory, in which each voter gives a preference on a set of alternatives, and a system outputs a single preference order based on the observed voters' preferences. In this paper, we observe that, if the given set of preference lists is not homogeneous, a unique true underling ranking might not exist. Moreover only the lists that share the highest amount of information should be aggregated, and thus multiple rankings might provide a more feasible solution to the problem. In this light, we propose Network Selection, an algorithm that, given a heterogeneous group of rankings, first discovers the different communities of homogeneous rankings and then combines only the rank orderings belonging to the same community into a single final ordering. Our novel approach is inspired by graph theory; indeed our set of lists can be loosely read as the nodes of a network. As a consequence, only the lists populating the same community in the network would then be aggregated. In order to highlight the strength of our proposal, we show an application both on simulated and on two real datasets, namely a financial and a biological dataset. Experimental results on simulated data show that Network Selection can significantly outperform existing related methods. The other way around, the empirical evidence achieved on real financial data reveals that Network Selection is also able to select the most relevant variables in data mining predictive models, providing a clear superiority in terms of predictive power of the models built. Furthermore, we show the potentiality of our proposal in the bioinformatics field, providing an application to a biological microarray dataset.  相似文献   

18.
Yunsong Qi  Xibei Yang 《Genomics》2013,101(1):38-48
An important application of gene expression data is to classify samples in a variety of diagnostic fields. However, high dimensionality and a small number of noisy samples pose significant challenges to existing classification methods. Focused on the problems of overfitting and sensitivity to noise of the dataset in the classification of microarray data, we propose an interval-valued analysis method based on a rough set technique to select discriminative genes and to use these genes to classify tissue samples of microarray data. We first select a small subset of genes based on interval-valued rough set by considering the preference-ordered domains of the gene expression data, and then classify test samples into certain classes with a term of similar degree. Experiments show that the proposed method is able to reach high prediction accuracies with a small number of selected genes and its performance is robust to noise.  相似文献   

19.
The potential of drift tube ion mobility (IM) spectrometry in combination with high performance liquid chromatography (LC) and mass spectrometry (MS) for the metabonomic analysis of rat urine is reported. The combined LC-IM-MS approach using quadrupole/time-of-flight mass spectrometry with electrospray ionisation, uses gas-phase analyte characterisation based on both mass-to-charge (m/z) ratio and relative gas-phase mobility (drift time) following LC separation. The technique allowed the acquisition of nested data sets, with mass spectra acquired at regular intervals (65 micros) during each IMS separation (approximately 13 ms) and several IMS spectra acquired during the elution of a single LC peak, without increasing the overall analysis time compared to LC-MS. Preliminary results indicate that spectral quality is improved when using LC-IM-MS, compared to direct injection IM-MS, for which significant ion suppression effects were observed in the electrospray ion source. The use of reversed-phase LC employing fast gradient elution reduced sample preparation to a minimum, whilst maintaining the potential for high throughput analysis. Data mining allowed information on specific analytes to be extracted from the complex metabonomic data set. LC-IM-MS based approaches may have a useful role in metabonomic analyses by introducing an additional discriminatory dimension of ion mobility (drift time).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号