首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
由于基因表达数据高属性维、低样本维的特点,Fisher分类器对该种数据分类性能不是很高。本文提出了Fisher的改进算法Fisher-List。该算法独特之处在于为每个类别确定一个决策阀值,每个阀值既包含总体样本信息,又含有某些对分类至关重要的个体样本信息。本文用实验证明新算法在基因表达数据分类方面比Fisher、LogitBoost、AdaBoost、k-近邻法、决策树和支持向量机具有更高的性能。  相似文献   

2.
Reviewing the literature on time on task effects on safety shows contradictory evidence, especially with regard to 12 h shifts. It is argued that this might depend on methodological problems associated with the analysis of accident data, e.g. selectivity of samples, validity of data bases and study designs, especially for analyses at the company level. Analyses of aggregated data seem to indicate an exponential increase of accident risk with time on task beyond the normal working day. This is supported by some recent studies based on data from the Federal Republic of Germany.  相似文献   

3.
Uher F 《Magyar onkologia》2001,45(1):59-66
As the Human Genome Project hurtles towards completion, DNA microarray technology offers the potential to open wide new windows into the study of genome complexity. DNA chips can be used for many different purposes, most prominently to measure levels of gene expression (messenger RNA abundance) for tens of thousands of genes simultaneously. But how much of this data is useful and is some superfluous? Can array data be used to identify a handful of critical genes that will lead to a more detailed taxonomy of haematological malignancies and can this or similar array data be used to predict clinical outcome? It is still too early to predict what the ultimate impact of DNA chips will be on our understanding of cancer biology. There are many critically important questions about this new field that are yet unaddressed. By the publication of this article, it is hoped that the technology of DNA chips will be opened up and demystified, and that additional opportunities for creative exploration will be catalysed.  相似文献   

4.
《Journal of Physiology》2009,103(6):315-323
The EEG is one of the most commonly used tools in brain research. Though of high relevance in research, the data obtained is very noisy and nonstationary. In the present article we investigate the applicability of a nonlinear data analysis method, the recurrence quantification analysis (RQA), to such data. The method solely rests on the natural property of recurrence which is a phenomenon inherent to complex systems, such as the brain. We show that this method is indeed suitable for the analysis of EEG data and that it might improve contemporary EEG analysis.  相似文献   

5.
MOTIVATION: To enhance the exploration of gene expression data in a metabolic context, one requires an application that allows the integration of this data and which represents this data in a (genome-wide) metabolic map. The layout of this metabolic map must be highly flexible to enable discoveries of biological phenomena. Moreover, it must allow the simultaneous representation of additional information about genes and enzymes. Since the layout and properties of existing maps did not fulfill our requirements, we developed a new way of representing gene expression data in metabolic charts. RESULTS: ViMAc generates user-specified (genome-wide) metabolic maps to explore gene expression data. To enhance the interpretation of these maps information such as sub-cellular localization is included. ViMAc can be used to analyse human or yeast expression data obtained with DNA microarrays or SAGE. We introduce our metabolic map method and demonstrate how it can be applied to explore DNA microarray data for yeast. Availability: ViMAc is freely available for academic institutions on request from the authors.  相似文献   

6.
The advent of high-throughput technologies and the concurrent advances in information sciences have led to an explosion in size and complexity of the data sets collected in biological sciences. The biggest challenge today is to assimilate this wealth of information into a conceptual framework that will help us decipher biological functions. A large and complex collection of data, usually called a data cloud, naturally embeds multi-scale characteristics and features, generically termed geometry. Understanding this geometry is the foundation for extracting knowledge from data. We have developed a new methodology, called data cloud geometry-tree (DCG-tree), to resolve this challenge. This new procedure has two main features that are keys to its success. Firstly, it derives from the empirical similarity measurements a hierarchy of clustering configurations that captures the geometric structure of the data. This hierarchy is then transformed into an ultrametric space, which is then represented via an ultrametric tree or a Parisi matrix. Secondly, it has a built-in mechanism for self-correcting clustering membership across different tree levels. We have compared the trees generated with this new algorithm to equivalent trees derived with the standard Hierarchical Clustering method on simulated as well as real data clouds from fMRI brain connectivity studies, cancer genomics, giraffe social networks, and Lewis Carroll''s Doublets network. In each of these cases, we have shown that the DCG trees are more robust and less sensitive to measurement errors, and that they provide a better quantification of the multi-scale geometric structures of the data. As such, DCG-tree is an effective tool for analyzing complex biological data sets.  相似文献   

7.
8.
A Primer on Metagenomics   总被引:1,自引:0,他引:1  
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.  相似文献   

9.
全球数据量快速增长,成为数字经济发展的核心引擎,但传统数据存储介质受到功耗、体积、成本等限制,难以满足不断增长的数据存储需求。以脱氧核糖核酸(deoxyribonucleic acid, DNA)分子作为存储介质的新型存储方式引起了国内外高度重视,世界主要国家均对其研究进行了顶层规划,部署了一系列重要科研计划。但是,DNA数据存储作为一个新兴交叉研究领域,其发展的“源”与“流”仍存在需要深入分析的问题。针对该问题,从信息、半导体与合成生物学交叉融合的角度深入挖掘DNA数据存储发展的源头,对近年来国际上主要国家与地区在DNA数据存储领域的发展规划进行分析归纳,梳理国内外的科研项目规划布局,尤其是美国“半导体合成生物学联盟”推动的基础研究项目、美国国防部高级研究计划局(Defense Advanced Research Projects Agency, DARPA)与美国情报高级研究计划局(Intelligence Advanced Research Projects Activity, IARPA)推动的面向应用的集中攻关项目、欧盟的地平线2020计划以及我国的重点研发计划等。通过比较可...  相似文献   

10.
A comment is made on an article recently published in this journal by Borgeat, Elie, and Castonguay (1991). It is noted that the raw data contained an outlier which is shown to have had a large influence on their estimation of the regression coefficient. Analysis of the data using a nonparametric statistic that is optimal for long-tailed distributions showed that the true regression coefficient is probably smaller than that reported. Other approaches to the analysis of data containing influential outliers are discussed.  相似文献   

11.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers.  相似文献   

12.
Allele frequency estimation from data on relatives.   总被引:34,自引:18,他引:16       下载免费PDF全文
Given genetic marker data on unrelated individuals, maximum-likelihood allele-frequency estimates and their standard errors are easily calculated from sample proportions. When marker phenotypes are observed on relatives, this method cannot be used without either discarding a subset of the data or incorrectly assuming that all individuals are unrelated. Here, I describe a method for allele frequency estimation for data on relatives that is based on standard methods of pedigree analysis. This method makes use of all available marker information while correctly taking into account the dependence between relatives. I illustrate use of the method with family data for a VNTR polymorphism near the apolipoprotein B locus.  相似文献   

13.
The assumption that total abundance of RNAs in a cell is roughly the same in different cells is underlying most studies based on gene expression analyses. But experiments have shown that changes in the expression of some master regulators such as c-MYC can cause global shift in the expression of almost all genes in some cell types like cancers. Such shift will violate this assumption and can cause wrong or biased conclusions for standard data analysis practices, such as detection of differentially expressed (DE) genes and molecular classification of tumors based on gene expression. Most existing gene expression data were generated without considering this possibility, and are therefore at the risk of having produced unreliable results if such global shift effect exists in the data. To evaluate this risk, we conducted a systematic study on the possible influence of the global gene expression shift effect on differential expression analysis and on molecular classification analysis. We collected data with known global shift effect and also generated data to simulate different situations of the effect based on a wide collection of real gene expression data, and conducted comparative studies on representative existing methods. We observed that some DE analysis methods are more tolerant to the global shift while others are very sensitive to it. Classification accuracy is not sensitive to the shift and actually can benefit from it, but genes selected for the classification can be greatly affected.  相似文献   

14.
传统的作物种质数据组织方法,针对不同作物种类建立不同数据表,这种方法已不能有效适应种质数据综合分析的需要.本文提出了一种基于属性分离存储的种质数据组织方法,根据种质的每个属性分别建立数据表,各属性间没有从属关系.该方法可统一数据查询操作,优化查询过程,提高分析效率,具有灵活、可扩展的特点,可以方便地集成与种质分析相关的数据,适用于种质资源分布式数据库和相关信息系统的建立.  相似文献   

15.
Gillis J  Pavlidis P 《PloS one》2011,6(2):e17258
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.  相似文献   

16.
The shortage of data for emissions from agricultural tractors contributes to LCA results on environmental load from modern crop production possibly having high error levels and high uncertainties. The first part of this work describes measurements and calculations made in order to obtain operation-specific agricultural emission data. Calculations are based on emission data measured on a standard 70 kW tractor of a widely available make. In the second part, results from an LCI on wheat production based on traditionally used emission data are calculated and compared with results obtained when using the emission data for specific working operations derived in part one. One conclusion of the study is that the emission values, when related to the energy in the used fuel, show very large variations between different driving operations. Another conclusion is that the use of the new data results in a marked reduction of the total air emissions produced in the wheat production chain, especially for CO and HC, but also for NOx and SO2.  相似文献   

17.
Proschan MA  Wittes J 《Biometrics》2000,56(4):1183-1187
Sample size calculations for a continuous outcome require specification of the anticipated variance; inaccurate specification can result in an underpowered or overpowered study. For this reason, adaptive methods whereby sample size is recalculated using the variance of a subsample have become increasingly popular. The first proposal of this type (Stein, 1945, Annals of Mathematical Statistics 16, 243-258) used all of the data to estimate the mean difference but only the first stage data to estimate the variance. Stein's procedure is not commonly used because many people perceive it as ignoring relevant data. This is especially problematic when the first stage sample size is small, as would be the case if the anticipated total sample size were small. A more naive approach uses in the denominator of the final test statistic the variance estimate based on all of the data. Applying the Helmert transformation, we show why this naive approach underestimates the true variance and how to construct an unbiased estimate that uses all of the data. We prove that the type I error rate of our procedure cannot exceed alpha.  相似文献   

18.
J P Hatch  T J Prihoda 《Biofeedback and self-regulation》1992,17(2):153-6; discussion 157-8
A comment is made on an article recently published in this journal by Borgeat, Elie, and Castonguay (1991). It is noted that the raw data contained an outlier which is shown to have had a large influence on their estimation of the regression coefficient. Analysis of the data using a nonparametric statistic that is optimal for long-tailed distributions showed that the true regression coefficient is probably smaller than that reported. Other approaches to the analysis of data containing influential outliers are discussed.  相似文献   

19.
Due to the fact that the flood data series of small drainage basins is relatively short, available data are often not sufficient for flood risk analysis. This presents the problem of risk analysis using very small data samples. One method that can be applied is to regard the available small samples as fuzzy information and optimize them using information diffusion technology to yield analytical results with greater reliability. In this article a risk analysis method based on information diffusion theory is applied to create a new flood risk analysis model. Application of the model is illustrated taking the Jinhuajiang and Qujiang drainage basins as examples. This is a new attempt at applying information diffusion theory in flood risk analysis. Computations based on this analytical flood risk model can yield an estimated flood damage value that is relatively accurate. This study indicates that the aforementioned model exhibits fairly stable analytical results, even when using a small set of sample data. The results also indicate that information diffusion technology is highly capable of extracting useful information and therefore improves system recognition accuracy. This method can be easily applied and the analytical results produced are easy to understand. Results are accurate enough to act as a guide in disaster situations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号