首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns.  相似文献   

2.
Two methods for single-trial analysis were compared, an established parametric template approach and a recently proposed non-parametric method based on complex bandpass filtering. The comparison was carried out by means of pseudo-real simulations based on magnetoencephalography measurements of cortical responses to auditory signals. The comparison focused on amplitude and latency estimation of the M100 response. The results show that both methods are well suited for single-trial analysis of the auditory evoked M100. While both methods performed similarly with respect to latency estimation, the non-parametric approach was observed to be more robust for amplitude estimation. The non-parametric approach can thus be recommended as an additional valuable tool for single-trial analysis.  相似文献   

3.
Species dispersal studies provide valuable information in biological research. Restricted dispersal may give rise to a non-random distribution of genotypes in space. Detection of spatial genetic structure may therefore provide valuable insight into dispersal. Spatial structure has been treated via autocorrelation analysis with several univariate statistics for which results could dependent on sampling designs. New geostatistical approaches (variogram-based analysis) have been proposed to overcome this problem. However, modelling parametric variograms could be difficult in practice. We introduce a non-parametric variogram-based method for autocorrelation analysis between DNA samples that have been genotyped by means of multilocus-multiallele molecular markers. The method addresses two important aspects of fine-scale spatial genetic analyses: the identification of a non-random distribution of genotypes in space, and the estimation of the magnitude of any non-random structure. The method uses a plot of the squared Euclidean genetic distances vs. spatial distances between pairs of DNA-samples as empirical variogram. The underlying spatial trend in the plot is fitted by a non-parametric smoothing (LOESS, Local Regression). Finally, the predicted LOESS values are explained by segmented regressions (SR) to obtain classical spatial values such as the extent of autocorrelation. For illustration we use multivariate and single-locus genetic distances calculated from a microsatellite data set for which autocorrelation was previously reported. The LOESS/SR method produced a good fit providing similar value of published autocorrelation for this data. The fit by LOESS/SR was simpler to obtain than the parametric analysis since initial parameter values are not required during the trend estimation process. The LOESS/SR method offers a new alternative for spatial analysis.  相似文献   

4.
In clinical trials examining the incidence of pneumonia it is a common practice to measure infection via both invasive and non-invasive procedures. In the context of a recently completed randomized trial comparing two treatments the invasive procedure was only utilized in certain scenarios due to the added risk involved, and given that the level of the non-invasive procedure surpassed a given threshold. Hence, what was observed was bivariate data with a pattern of missingness in the invasive variable dependent upon the value of the observed non-invasive observation within a given pair. In order to compare two treatments with bivariate observed data exhibiting this pattern of missingness we developed a semi-parametric methodology utilizing the density-based empirical likelihood approach in order to provide a non-parametric approximation to Neyman-Pearson-type test statistics. This novel empirical likelihood approach has both a parametric and non-parametric components. The non-parametric component utilizes the observations for the non-missing cases, while the parametric component is utilized to tackle the case where observations are missing with respect to the invasive variable. The method is illustrated through its application to the actual data obtained in the pneumonia study and is shown to be an efficient and practical method.  相似文献   

5.
We describe a non-parametric optimal design as a theoretical gold standard for dose finding studies. Its purpose is analogous to the Cramer-Rao bound for unbiased estimators, i.e. it provides a bound beyond which improvements are not generally possible. The bound applies to the class of non-parametric designs where the data are not assumed to be generated by any known parametric model. Whenever parametric assumptions really hold it may be possible to do better than the optimal non-parametric design. The goal is to be able to compare any potential dose finding scheme with the optimal non-parametric benchmark. This paper makes precise what is meant by optimal in this context and also why the procedure is described as non-parametric.  相似文献   

6.
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson–Volterra models are obtained herein from broadband experimental input–output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input–output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.  相似文献   

7.
This research provides a new way to measure error in microarray data in order to improve gene expression analysis.Microarray data contains many sources of error.In order to glean information about mRNA expression levels,the true signal must first be segregated from noise.This research focuses on the variation that can be captured at the spot level in cDNA microarray images.Variation at other levels,due to differences at the array,dye,and block levels,can be corrected for by a variety of existing normalization procedures.Two signal quality estimates that capture the reliability of each spot printed on a microarray are described.A parametric estimate of within-spot vari ance,referred to here as σ s2pot,assumes that pixels follow a normal distribution and are spatially correlated.A non-parametric estimate of error,called the mean square prediction error(MSPE),assumes that spots of high quality possess pixels that are similar to their neighbors.This paper will provide a framework to use either spot quality measure in downstream analysis,specifically as weights in regression models.Using these spot quality estimates as weights can result in greater efficiency,in a statistical sense,when modeling microarray data.  相似文献   

8.
Simulating signal transduction in cellular signaling networks provides predictions of network dynamics by quantifying the changes in concentration and activity-level of the individual proteins. Since numerical values of kinetic parameters might be difficult to obtain, it is imperative to develop non-parametric approaches that combine the connectivity of a network with the response of individual proteins to signals which travel through the network. The activity levels of signaling proteins computed through existing non-parametric modeling tools do not show significant correlations with the observed values in experimental results. In this work we developed a non-parametric computational framework to describe the profile of the evolving process and the time course of the proportion of active form of molecules in the signal transduction networks. The model is also capable of incorporating perturbations. The model was validated on four signaling networks showing that it can effectively uncover the activity levels and trends of response during signal transduction process.  相似文献   

9.
DNA microarray experiments have generated large amount of gene expression measurements across different conditions. One crucial step in the analysis of these data is to detect differentially expressed genes. Some parametric methods, including the two-sample t-test (T-test) and variations of it, have been used. Alternatively, a class of non-parametric algorithms, such as the Wilcoxon rank sum test (WRST), significance analysis of microarrays (SAM) of Tusher et al. (2001), the empirical Bayesian (EB) method of Efron et al. (2001), etc., have been proposed. Most available popular methods are based on t-statistic. Due to the quality of the statistic that they used to describe the difference between groups of data, there are situations when these methods are inefficient, especially when the data follows multi-modal distributions. For example, some genes may display different expression patterns in the same cell type, say, tumor or normal, to form some subtypes. Most available methods are likely to miss these genes. We developed a new non-parametric method for selecting differentially expressed genes by relative entropy, called SDEGRE, to detect differentially expressed genes by combining relative entropy and kernel density estimation, which can detect all types of differences between two groups of samples. The significance of whether a gene is differentially expressed or not can be estimated by resampling-based permutations. We illustrate our method on two data sets from Golub et al. (1999) and Alon et al. (1999). Comparing the results with those of the T-test, the WRST and the SAM, we identified novel differentially expressed genes which are of biological significance through previous biological studies while they were not detected by the other three methods. The results also show that the genes selected by SDEGRE have a better capability to distinguish the two cell types.  相似文献   

10.
11.
Tracy  L  Bergemann 《遗传学报》2010,37(4):265-279
This research provides a new way to measure error in microarray data in order to improve gene expression analysis. Microarray data contains many sources of error. In order to glean information about mRNA expression levels, the true signal must first be segregated from noise. This research focuses on the variation that can be captured at the spot level in cDNA microarray images. Variation at other levels, due to differences at the array, dye, and block levels, can be corrected for by a variety of existing normalization procedures. Two signal quality estimates that capture the reliability of each spot printed on a microarray are described. A parametric estimate of within-spot variance, referred to here as σ2spot, assumes that pixels follow a normal distribution and are spatially correlated. A non-parametric estimate of error, called the mean square prediction error (MSPE), assumes that spots of high quality possess pixels that are similar to their neighbors. This paper will provide a framework to use either spot quality measure in downstream analysis, specifically as weights in regression models. Using these spot quality estimates as weights can result in greater efficiency, in a statistical sense, when modeling microarray data.  相似文献   

12.
This paper reviews a general framework for the modelling of longitudinal data with random measurement times based on marked point processes and presents a worked example. We construct a quite general regression models for longitudinal data, which may in particular include censoring that only depend on the past and outside random variation, and dependencies between measurement times and measurements. The modelling also generalises statistical counting process models. We review a non-parametric Nadarya-Watson kernel estimator of the regression function, and a parametric analysis that is based on a conditional least squares (CLS) criterion. The parametric analysis presented, is a conditional version of the generalised estimation equations of LIANG and ZEGER (1986). We conclude that the usual nonparametric and parametric regression modelling can be applied to this general set-up, with some modifications. The presented framework provides an easily implemented and powerful tool for model building for repeated measurements.  相似文献   

13.
The purpose of this review was to integrate recent evidence supporting the reliability of noninvasive measures of parasympathetic and sympathetic activity. Literature concerning spectral analysis of heart period (HP) variability is reviewed with special emphasis on works revealing neural mediation of high-frequency and mid-frequency components of HP power spectrum and suggesting their use as a tool to assess autonomic balance. Problems of derivations of autonomic indices based on impedance cardiography and HP variance analysis are discussed. Advantages of parametric time series (autoregressive-AR) models are described with the objective of providing an informed basis for choosing among methodological alternatives. Two original approaches developed in our laboratory are outlined, namely the algorithms for systolic time interval assessment based on impedance cardiogram as well as the AR method developed for heart period power spectral density estimation.  相似文献   

14.
MOTIVATION: Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). While small sample size remains one of the key features of costly clinical investigations or of microarray studies that have limited resources in funding, time and tissue materials, accurate and easy-to-implement error estimation methods for small samples are desirable and will be beneficial. RESULTS: A bootstrap cross-validation method is studied. It achieves accurate error estimation through a simple procedure with bootstrap resampling and only costs computer CPU time. Simulation studies and applications to microarray data demonstrate that it performs consistently better than its competitors. This method possesses several attractive properties: (1) it is implemented through a simple procedure; (2) it performs well for small samples with sample size, as small as 16; (3) it is not restricted to any particular classification rules and thus applies to many parametric or non-parametric methods.  相似文献   

15.
In QTL analysis of non-normally distributed phenotypes, non-parametric approaches have been proposed as an alternative to the use of parametric tests on mathematically transformed data. The non-parametric interval mapping test uses random ranking to deal with ties. Another approach is to assign to each tied individual the average of the tied ranks (midranks). This approach is implemented and compared to the random ranking approach in terms of statistical power and accuracy of the QTL position. Non-normal phenotypes such as bacteria counts showing high numbers of zeros are simulated (0-80% zeros). We show that, for low proportions of zeros, the power estimates are similar but, for high proportions of zeros, the midrank approach is superior to the random ranking approach. For example, with a QTL accounting for 8% of the total phenotypic variance, a gain from 8% to 11% of power can be obtained. Furthermore, the accuracy of the estimated QTL location is increased when using midranks. Therefore, if non-parametric interval mapping is chosen, the midrank approach should be preferred. This test might be especially relevant for the analysis of disease resistance phenotypes such as those observed when mapping QTLs for resistance to infectious diseases.  相似文献   

16.
Parametric and non-parametric modeling methods are combined to study the short-term plasticity (STP) of synapses in the central nervous system (CNS). The nonlinear dynamics of STP are modeled by means: (1) previously proposed parametric models based on mechanistic hypotheses and/or specific dynamical processes, and (2) non-parametric models (in the form of Volterra kernels) that transforms the presynaptic signals into postsynaptic signals. In order to synergistically use the two approaches, we estimate the Volterra kernels of the parametric models of STP for four types of synapses using synthetic broadband input–output data. Results show that the non-parametric models accurately and efficiently replicate the input–output transformations of the parametric models. Volterra kernels provide a general and quantitative representation of the STP.  相似文献   

17.
Humans utilize facial appearance, gender, expression, aging pattern, and other ancillary information to recognize individuals. It is interesting to observe how humans perceive facial age. Analyzing these properties can help in understanding the phenomenon of facial aging and incorporating the findings can help in designing effective algorithms. Such a study has two components - facial age estimation and age-separated face recognition. Age estimation involves predicting the age of an individual given his/her facial image. On the other hand, age-separated face recognition consists of recognizing an individual given his/her age-separated images. In this research, we investigate which facial cues are utilized by humans for estimating the age of people belonging to various age groups along with analyzing the effect of one''s gender, age, and ethnicity on age estimation skills. We also analyze how various facial regions such as binocular and mouth regions influence age estimation and recognition capabilities. Finally, we propose an age-invariant face recognition algorithm that incorporates the knowledge learned from these observations. Key observations of our research are: (1) the age group of newborns and toddlers is easiest to estimate, (2) gender and ethnicity do not affect the judgment of age group estimation, (3) face as a global feature, is essential to achieve good performance in age-separated face recognition, and (4) the proposed algorithm yields improved recognition performance compared to existing algorithms and also outperforms a commercial system in the young image as probe scenario.  相似文献   

18.
Many algorithms have been described in the literature for estimating amplitude, frequency variables and conduction velocity of the surface EMG signal detected during voluntary contractions. They have been used in different application areas for the non invasive assessment of muscle functions. Although many studies have focused on the comparison of different methods for information extraction from surface EMG signals, they have been carried out under different conditions and a complete comparison is not available. It is the purpose of this paper to briefly review the most frequently used algorithms for EMG variable estimation, compare them using computer generated as well as real signals and outline the advantages and drawbacks of each. In particular the paper focuses on the issue of EMG amplitude estimation with and without pre-whitening of the signal, mean and median frequency estimation with periodogram and autoregressive based algorithms both in stationary and non-stationary conditions, delay estimation for the calculation of muscle fiber conduction velocity.  相似文献   

19.
ABSTRACT

We report on our research efforts towards developing efficient equipment for the automatic recognition of insects using only the acoustic modality. Specifically, we deal with three groups of insects, namely the crickets, cicadas and katydids. Inspired by well-documented tactics of speech processing, the signal processing employed in the present work is elaborated further with respect to the sound production mechanisms of insects. In order to improve the practical efficacy of our equipment, we adopt a score-level fusion of classifiers with non-parametric (probabilistic neural network) and parametric (Gaussian mixture models) estimation of the probability density function. An efficient hierarchic classification scheme is introduced, where the identification of unlabelled input takes place at various levels of hierarchy, such as suborder, family, subfamily, genus and species. We evaluate the practical significance of our approach on a large and well-documented catalogue of recordings of crickets, cicadas and katydids. For the hierarchic classification scheme, we report identification accuracy that exceeds 99% at suborder and family levels. In the straight classification scheme, we report accuracy of 90% for 307 species.  相似文献   

20.
The increasing use of cDNA microarrays necessitates the development of methods for extracting quality data. Here, we set forth hurdles to overcome in image analysis of microarrays. We emphasize the importance of objective data extraction methods resulting in reliable signal estimates. Based on statistical principles, we describe a method for automated grid alignment, spot detection, background estimation, flagging, and signal extraction. A software application that we call SignalViewer has been implemented for this method. We identify areas where we improved upon current methods used for array image analysis at each step in the process. Finally, we give examples to illustrate the performance of our algorithms on raw data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号