首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An alternative technique for sleep stages classification based on heart rate variability (HRV) was presented in this paper. The simple subject specific scheme and a more practical subject independent scheme were designed to classify wake, rapid eye movement (REM) sleep and non-REM (NREM) sleep. 41 HRV features extracted from RR sequence of 45 healthy subjects were trained and tested through random forest (RF) method. Among the features, 25 were newly proposed or applied to sleep study for the first time. For the subject independent classifier, all features were normalized with our developed fractile values based method. Besides, the importance of each feature for sleep staging was also assessed by RF and the appropriate number of features was explored. For the subject specific classifier, a mean accuracy of 88.67% with Cohen's kappa statistic κ of 0.7393 was achieved. While the accuracy and κ dropped to 72.58% and 0.4627, respectively when the subject independent classifier was considered. Some new proposed HRV features even performed more effectively than the conventional ones. The proposed method could be used as an alternative or aiding technique for rough and convenient sleep stages classification.  相似文献   

2.
Assessing the agreement between two or more raters is an important topic in medical practice. Existing techniques, which deal with categorical data, are based on contingency tables. This is often an obstacle in practice as we have to wait for a long time to collect the appropriate sample size of subjects to construct the contingency table. In this paper, we introduce a nonparametric sequential test for assessing agreement, which can be applied as data accrues, does not require a contingency table, facilitating a rapid assessment of the agreement. The proposed test is based on the cumulative sum of the number of disagreements between the two raters and a suitable statistic representing the waiting time until the cumulative sum exceeds a predefined threshold. We treat the cases of testing two raters' agreement with respect to one or more characteristics and using two or more classification categories, the case where the two raters extremely disagree, and finally the case of testing more than two raters' agreement. The numerical investigation shows that the proposed test has excellent performance. Compared to the existing methods, the proposed method appears to require significantly smaller sample size with equivalent power. Moreover, the proposed method is easily generalizable and brings the problem of assessing the agreement between two or more raters and one or more characteristics under a unified framework, thus providing an easy to use tool to medical practitioners.  相似文献   

3.
To recognise and classify movement patterns correctly can be a difficult task. Nevertheless, movement analysts are working on it on a daily basis. Therefore, we have developed and evaluated a method to do the classification by using contact forces during hopping in a sledge system. Here, experiments showed that reaction-forces of different subjects on a sliding sledge could be divided into four major types. These types are symmetric single-modal (type I), positive mono-modal (type II), negative mono-modal (type III), and multi-modal associated with plateau formation (type IV).Up until now, an exact determination of these types was not possible. However, the new method helps to approximate those four types with well established mathematical functions. With this approach, the measured reaction-force will be reproduced by particular coefficients. Subsequently, the coefficients are subjected to a discriminant-analysis. The result is a three-dimensional function-coefficient, which allows the classification of the actual force-pattern on the one of the four types.  相似文献   

4.
MOTIVATION: Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS: We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.  相似文献   

5.
The neutral theory of molecular evolution predicts that the ratio of polymorphisms to fixed differences should be fairly uniform across a region of DNA sequence. Significant heterogeneity in this ratio can indicate the effects of balancing selection, selective sweeps, mildly deleterious mutations, or background selection. Comparing an observed heterogeneity statistic with simulations of the heterogeneity resulting from random phylogenetic and sampling variation provides a test of the statistical significance of the observed pattern. When simulated data sets containing heterogeneity in the polymorphism-to-divergence ratio are examined, different statistics are most powerful for detecting different patterns of heterogeneity. The number of runs is most powerful for detecting patterns containing several peaks of polymorphism; the Kolmogorov-Smirnov statistic is most powerful for detecting patterns in which one end of the gene has high polymorphism and the other end has low polymorphism; and a newly developed statistic, the mean sliding G statistic, is most powerful for detecting patterns containing one or two peaks of polymorphism with reduced polymorphism on either side. Nine out of 27 genes from the Drosophila melanogaster subgroup exhibit heterogeneity that is significant under at least one of these three tests, with five of the nine remaining significant after a correction for multiple comparisons, suggesting that detectable evidence for the effects of some kind of selection is fairly common.   相似文献   

6.
Several stratagems are used in protein bioinformatics for the classification of proteins based on sequence, structure or function. We explore the concept of a minimal signature embedded in a sequence that defines the likely position of a protein in a classification. Specifically, we address the derivation of sparse profiles for the G-protein coupled receptor (GPCR) clan of integral membrane proteins. We present an evolutionary algorithm (EA) for the derivation of sparse profiles (signatures) without the need to supply a multiple alignment. We also apply an evolution strategy (ES) to the problem of pattern and profile refinement. Patterns were derived for the GPCR 'superfamily' and GPCR families 1-3 individually from starting populations of randomly generated signatures, using a database of integral membrane protein sequences and an objective function using a modified receiver operator characteristic (ROC) statistic. The signature derived for the family 1 GPCR sequences was shown to perform very well in a stringent cross-validation test, detecting 76% of unseen GPCR sequences at 5% error. Application of the ES refinement method to a signature developed by a previously described method [Sadowski, M.I., Parish, J.H., 2003. Automated generation and refinement of protein signatures: case study with G-protein coupled receptors. Bioinformatics 19, 727-734] resulted in a 6% increase of coverage for 5% error as measured in the validation test. We note that there might be a limit to this or any classification of proteins based on patterns or schemata.  相似文献   

7.
Haplotype-based risk models can lead to powerful methods for detecting the association of a disease with a genomic region of interest. In population-based studies of unrelated individuals, however, the haplotype status of some subjects may not be discernible without ambiguity from available locus-specific genotype data. A score test for detecting haplotype-based association using genotype data has been developed in the context of generalized linear models for analysis of data from cross-sectional and retrospective studies. In this article, we develop a test for association using genotype data from cohort and nested case-control studies where subjects are prospectively followed until disease incidence or censoring (end of follow-up) occurs. Assuming a proportional hazard model for the haplotype effects, we derive an induced hazard function of the disease given the genotype data, and hence propose a test statistic based on the associated partial likelihood. The proposed test procedure can account for differential follow-up of subjects, can adjust for possibly time-dependent environmental co-factors and can make efficient use of valuable age-at-onset information that is available on cases. We provide an algorithm for computing the test statistic using readily available statistical software. Utilizing simulated data in the context of two genomic regions GPX1 and GPX3, we evaluate the validity of the proposed test for small sample sizes and study its power in the presence and absence of missing genotype data.  相似文献   

8.
Linkage disequilibrium testing when linkage phase is unknown   总被引:2,自引:0,他引:2  
Schaid DJ 《Genetics》2004,166(1):505-512
Linkage disequilibrium, the nonrandom association of alleles from different loci, can provide valuable information on the structure of haplotypes in the human genome and is often the basis for evaluating the association of genomic variation with human traits among unrelated subjects. But, linkage phase of genetic markers measured on unrelated subjects is typically unknown, and so measurement of linkage disequilibrium, and testing whether it differs significantly from the null value of zero, requires statistical methods that can account for the ambiguity of unobserved haplotypes. A common method to test whether linkage disequilibrium differs significantly from zero is the likelihood-ratio statistic, which assumes Hardy-Weinberg equilibrium of the marker phenotype proportions. We show, by simulations, that this approach can be grossly biased, with either extremely conservative or liberal type I error rates. In contrast, we use simulations to show that a composite statistic, proposed by Weir and Cockerham, maintains the correct type I error rates, and, when comparisons are appropriate, has similar power as the likelihood-ratio statistic. We extend the composite statistic to allow for more than two alleles per locus, providing a global composite statistic, which is a strong competitor to the usual likelihood-ratio statistic.  相似文献   

9.
A global data set on forest cover change was recently published and made freely available for use (Hansen et al. 2013. Science 342: 850–853). Although this data set has been criticized for inaccuracies in distinguishing vegetation types at the local scale, it remains a valuable source of forest cover information for areas where local data is severely lacking. Masoala National Park, in northeastern Madagascar, is an example of a region for which very little spatially explicit forest cover information is available. Yet, this extremely diverse tropical humid forest is undergoing a dramatic rate of forest degradation and deforestation through illegal selective logging of rosewood and ebony, slash‐and‐burn agriculture, and damage due to cyclones. All of these processes result in relatively diffuse and small‐scale changes in forest cover. In this paper, we examine to what extent Hansen et al.'s global forest change data set captures forest loss within Masoala National Park by comparing its performance to a locally calibrated, object‐oriented classification approach. We verify both types of classification with substantial ground truthing. We find that both the global and local classifications perform reasonably well in detecting small‐scale slash‐and‐burn agriculture, but neither performs adequately in detecting selective logging. We conclude that since the use of the global forest change data set requires very little technical and financial investment, and performs almost as well as the more resource‐demanding, locally calibrated classification, it may be advantageous to use the global forest change data set even for local conservation purposes.  相似文献   

10.
Statistical methods are discussed, which are used in the analysis of point patterns. Special attention has been paid to their application in ecological research. Some new procedures are presented, which seem to be better compatible with the needs of the ecologist.It is pointed out that patterns can usually be described in terms of an appropriate trend surface as well as in terms of mutual interactions. This circumstance restricts the value of the analysis of point patterns for ecological research in tracing the mechanisms which are connected with the distribution of individuals.After having discussed the current sampling designs with respect to point patterns, the, estimation of the local intensity is treated. Although the so-called distance-method has got considerable attention in this respect, it is stated that this method is not very appropriate for this purpose. For two sampling designs, it is illustrated how to estimate functions, which describe density variation in the field.Further, a procedure is proposed which estimates the covariance curve, as well as the total amount of interaction in the pattern. The relation of the statistic with the covariance curve has been pointed out. An improvement has been proposed of the well-known Greigh-Smith method, i.e. the estimation of the variance curve.The estimation procedures proposed have been illustrated by three examples from the field, i.e. dispersal patterns of barnacles, anemones and glassworts, all belonging to low structured communities. They are presented in the Appendix. Monte Carlo-methods are used to study the properties of some statistical procedures.This paper has been part of a Ph.D. thesis, State Univ. Leiden, May 1977.  相似文献   

11.
Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1) images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization) with convenience.  相似文献   

12.
ABSTRACT: BACKGROUND: Early classification of time series is beneficial for biomedical informatics problems suchincluding, but not limited to, disease change detection. Early classification can be oftremendous help by identifying the onset of a disease before it has time to fully take hold. Inaddition, extracting patterns from the original time series helps domain experts to gaininsights into the classification results. This problem has been studied recently using timeseries segments called shapelets. In this paper, we present a method, which we callMultivariate Shapelets Detection (MSD), that allows for early and patient-specificclassification of multivariate time series. The method extracts time series patterns, calledmultivariate shapelets, from all dimensions of the time series that distinctly manifest thetarget class locally. The time series were classified by searching for the earliest closestpatterns. RESULTS: The proposed early classification method for multivariate time series has been evaluated oneight gene expression datasets from viral infection and drug response studies in humans. Inour experiments, the MSD method outperformed the baseline methods, achieving highlyaccurate classification by using as little as 40%-64% of the time series. The obtained resultsprovide evidence that using conventional classification methods on short time series is notas accurate as using the proposed methods specialized for early classification. CONCLUSION: For the early classification task, we proposed a method called Multivariate ShapeletsDetection (MSD), which extracts patterns from all dimensions of the time series. Weshowed that the MSD method can classify the time series early by using as little as40%-64% of the time series' length.  相似文献   

13.
The study of functional brain connectivity alterations induced by neurological disorders and their analysis from resting state functional Magnetic Resonance Imaging (rfMRI) is generally considered to be a challenging task. The main challenge lies in determining and interpreting the large-scale connectivity of brain regions when studying neurological disorders such as epilepsy. We tackle this challenging task by studying the cortical region connectivity using a novel approach for clustering the rfMRI time series signals and by identifying discriminant functional connections using a novel difference statistic measure. The proposed approach is then used in conjunction with the difference statistic to conduct automatic classification experiments for epileptic and healthy subjects using the rfMRI data. Our results show that the proposed difference statistic measure has the potential to extract promising discriminant neuroimaging markers. The extracted neuroimaging markers yield 93.08% classification accuracy on unseen data as compared to 80.20% accuracy on the same dataset by a recent state-of-the-art algorithm. The results demonstrate that for epilepsy the proposed approach confirms known functional connectivity alterations between cortical regions, reveals some new connectivity alterations, suggests potential neuroimaging markers, and predicts epilepsy with high accuracy from rfMRI scans.  相似文献   

14.
In this paper we present a procedure to measure the degree of imbalance of an unbalanced data set. The procedure is based on choosing an appropriate loglinear model for the subclass frequencies of the data. A measure of imbalance is then introduced as some function of the chi-squared statistic used in the goodness-of-fit test for the loglinear model. The proposed procedure can also be used to measure departures from certain types of balance, such as proportionality of subclass frequencies, partial balance, and last-stage uniformity.  相似文献   

15.
The most widely used statistical methods for finding differentially expressed genes (DEGs) are essentially univariate. In this study, we present a new T(2) statistic for analyzing microarray data. We implemented our method using a multiple forward search (MFS) algorithm that is designed for selecting a subset of feature vectors in high-dimensional microarray datasets. The proposed T2 statistic is a corollary to that originally developed for multivariate analyses and possesses two prominent statistical properties. First, our method takes into account multidimensional structure of microarray data. The utilization of the information hidden in gene interactions allows for finding genes whose differential expressions are not marginally detectable in univariate testing methods. Second, the statistic has a close relationship to discriminant analyses for classification of gene expression patterns. Our search algorithm sequentially maximizes gene expression difference/distance between two groups of genes. Including such a set of DEGs into initial feature variables may increase the power of classification rules. We validated our method by using a spike-in HGU95 dataset from Affymetrix. The utility of the new method was demonstrated by application to the analyses of gene expression patterns in human liver cancers and breast cancers. Extensive bioinformatics analyses and cross-validation of DEGs identified in the application datasets showed the significant advantages of our new algorithm.  相似文献   

16.
The study of local adaptation is a main focus of evolutionary biology since it may contribute to explain the current species diversity. The genomic scan procedures permit for the first time to study the connection between specific DNA patterns and processes as natural selection, genetic drift, recombination, mutation and gene flow. Accordingly, the information on genomes from non-model organisms increases and the interest on detecting the signal of natural selection in the DNA sequences of different populations also raises. The main goal of the present work is to explore a sequence-based method for detecting natural selection in divergent populations connected by migration. In doing so, we rely on a recently published statistic based upon th e definition of haplotype allelic classes (HAC). The original measure was modified to be more sensitive to intermediate frequencies in non-model species. A linkage-disequilibrium-based method was also assayed and individual-based simulations were performed to test the methods. The results suggest that the HAC-based methods and, specifically, the new proposed method are quite powerful for detecting the footprint of moderate divergent selection. They are also robust to reasonable model misspecification. One obvious advantage of the new algorithm is that it does not require knowledge of the allelic state.  相似文献   

17.
Proportional hazards regression for cancer studies   总被引:1,自引:0,他引:1  
Ghosh D 《Biometrics》2008,64(1):141-148
Summary.   There has been some recent work in the statistical literature for modeling the relationship between the size of cancers and probability of detecting metastasis, i.e., aggressive disease. Methods for assessing covariate effects in these studies are limited. In this article, we formulate the problem as assessing covariate effects on a right-censored variable subject to two types of sampling bias. The first is the length-biased sampling that is inherent in screening studies; the second is the two-phase design in which a fraction of tumors are measured. We construct estimation procedures for the proportional hazards model that account for these two sampling issues. In addition, a Nelson–Aalen type estimator is proposed as a summary statistic. Asymptotic results for the regression methodology are provided. The methods are illustrated by application to data from an observational cancer study as well as to simulated data.  相似文献   

18.
Every year about one million people die due to diseases transmitted by mosquitoes. The infection is transmitted to a person when an infected mosquito stings, injecting the saliva into the human body. The best possible way to prevent a mosquito-borne infection till date is to save the humans from exposure to mosquito bites. This study proposes a Machine Learning (ML) and Deep Learning based system to detect the presence of two critical disease spreading classes of mosquitoes such as the Aedes and Culex. The proposed system will effectively aid in epidemiology to design evidence-based policies and decisions by analyzing the risks and transmission. The study proposes an effective methodology for the classification of mosquitoes using ML and CNN models. The novel RIFS has been introduced which integrates two types of feature selection techniques – the ROI-based image filtering and the wrappers-based FFS technique. Comparative analysis of various ML and deep learning models has been performed to determine the most appropriate model applicable based on their performance metrics as well as computational needs. Results prove that ETC outperformed among the all applied ML model by providing 0.992 accuracy while VVG16 has outperformed other CNN models by giving 0.986 of accuracy.  相似文献   

19.
20.
高分辨率影像支持的群落尺度沼泽湿地分类制图   总被引:2,自引:0,他引:2  
李娜  周德民  赵魁义 《生态学报》2011,31(22):6717-6726
湿地作为众多野生动物和植物的栖息地,具有稳定环境及物种基因保护等重要功能.但是,湿地复杂的水陆交界生境特征及难以进入等客观条件限制给湿地研究造成了很大的困难.因此,遥感技术作为地表生态环境过程参量获取的重要工具,在当今湿地科学领域发挥着重要作用,特别是,当前高空间分辨率影像的性能与应用水平不断得到提高.以自然状态下的黑龙江三江平原洪河国家级自然保护区为研究对象,应用飞艇搭载的空间高分辨率摄像系统获取影像地面分辨率为0.13m的影像数据,主要结合面向对象分类方法,开展了基于湿地植物群落尺度的分类制图研究.结果表明:①因飞艇影像对植物形态、纹理等细致特征的刻画非常充分,沼泽植被型、草甸植被型和各种乔木、灌木植被型,都可以在合适的遥感分类方法下提取出来,总体分类精度能达到91.77%;②通过采用针对高分辨率影像面向对象的分类方法与传统的最大似然比遥感分类方法对比,前者达到很高的精度,而后者效果不理想,说明遥感分类方法的选择对于群落尺度湿地植物分类制图结果非常重要;③遥感分类制图的结果显示出研究区湿地植物群落分布格局受到水分环境梯度和微地貌的共同控制,呈现交替环带状分布规律.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号