首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
太阳女神螺类(helcionellids)是寒武纪常见的一类软体动物,广布全球,多见于碳酸盐岩沉积地层中,而布尔吉斯页岩型生物群中却少有报道。贵州剑河凯里生物群是典型的特异埋藏化石库,产有700余件太阳女神螺类标本,这在世界各地同时期的布尔吉斯页岩型生物群中是很少见的。本文对贵州剑河八郎寒武系凯里组太阳女神螺类化石标本进行了系统研究,采用几何形态测量学的典型变量分析进行量化分析。几何形态测量学(Geometric morphometrics)是用界标点或轮廓线等来描绘生物的形态或者标记特征部位及器官,将生物形态特征归纳为数据变化的定量学方法,其中的典型变量分析(Canonical Variate Analysis)是多变量分析中进行判别分析的一个重要方法,可以用于多组数据之间的判别。CVA判别结果显示:利用壳体侧视轮廓线判别Dorispira属中三个种的正确率为92%,验证了Dorispira accordinonata、D.taijiangensis和D.cf.pearylandica化石种分类合理性。本文研究表明即使壳体形态较为相似的类群,也可以较为准确地使用CVA量化其中的差别,...  相似文献   

2.
This article considers global tests of differences between paired vectors of binomial probabilities, based on data from two dependent multivariate binary samples. Difference is defined as either an inhomogeneity in the marginal distributions or asymmetry in the joint distribution. For detecting the first type of difference, we propose a multivariate extension of McNemar's test and show that it is a generalized score test under a generalized estimating equations (GEE) approach. Univariate features such as the relationship between the Wald and score tests and the dropout of pairs with the same response carry over to the multivariate case and the test does not depend on the working correlation assumption among the components of the multivariate response. For sparse or imbalanced data, such as occurs when the number of variables is large or the proportions are close to zero, the test is best implemented using a bootstrap, and if this is computationally too complex, a permutation distribution. We apply the test to safety data for a drug, in which two doses are evaluated by comparing multiple responses by the same subjects to each one of them.  相似文献   

3.
Summary As most georeferenced data sets are multivariate and concern variables of different types, spatial mapping methods must be able to deal with such data. The main difficulties are the prediction of non‐Gaussian variables and the modeling of the dependence between processes. The aim of this article is to present a new hierarchical Bayesian approach that permits simultaneous modeling of dependent Gaussian, count, and ordinal spatial fields. This approach is based on spatial generalized linear mixed models. We use a moving average approach to model the spatial dependence between the processes. The method is first validated through a simulation study. We show that the multivariate model has better predictive abilities than the univariate one. Then the multivariate spatial hierarchical model is applied to a real data set collected in French Guiana to predict topsoil patterns.  相似文献   

4.
5.
Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.  相似文献   

6.
Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter) or the multivariate (wrapper) approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.  相似文献   

7.
A Bayesian network classification methodology for gene expression data.   总被引:5,自引:0,他引:5  
We present new techniques for the application of a Bayesian network learning framework to the problem of classifying gene expression data. The focus on classification permits us to develop techniques that address in several ways the complexities of learning Bayesian nets. Our classification model reduces the Bayesian network learning problem to the problem of learning multiple subnetworks, each consisting of a class label node and its set of parent genes. We argue that this classification model is more appropriate for the gene expression domain than are other structurally similar Bayesian network classification models, such as Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with prior domain experience suggesting that a relatively small number of genes, taken in different combinations, is required to predict most clinical classes of interest. Within this framework, we consider two different approaches to identifying parent sets which are supported by the gene expression observations and any other currently available evidence. One approach employs a simple greedy algorithm to search the universe of all genes; the second approach develops and applies a gene selection algorithm whose results are incorporated as a prior to enable an exhaustive search for parent sets over a restricted universe of genes. Two other significant contributions are the construction of classifiers from multiple, competing Bayesian network hypotheses and algorithmic methods for normalizing and binning gene expression data in the absence of prior expert knowledge. Our classifiers are developed under a cross validation regimen and then validated on corresponding out-of-sample test sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets for two publicly available datasets. We present an extensive compilation of results reported in the literature for other classification methods run against these same two datasets. Our results are comparable to, or better than, any we have found reported for these two sets, when a train-test protocol as stringent as ours is followed.  相似文献   

8.
Yang X  Belin TR  Boscardin WJ 《Biometrics》2005,61(2):498-506
Across multiply imputed data sets, variable selection methods such as stepwise regression and other criterion-based strategies that include or exclude particular variables typically result in models with different selected predictors, thus presenting a problem for combining the results from separate complete-data analyses. Here, drawing on a Bayesian framework, we propose two alternative strategies to address the problem of choosing among linear regression models when there are missing covariates. One approach, which we call "impute, then select" (ITS) involves initially performing multiple imputation and then applying Bayesian variable selection to the multiply imputed data sets. A second strategy is to conduct Bayesian variable selection and missing data imputation simultaneously within one Gibbs sampling process, which we call "simultaneously impute and select" (SIAS). The methods are implemented and evaluated using the Bayesian procedure known as stochastic search variable selection for multivariate normal data sets, but both strategies offer general frameworks within which different Bayesian variable selection algorithms could be used for other types of data sets. A study of mental health services utilization among children in foster care programs is used to illustrate the techniques. Simulation studies show that both ITS and SIAS outperform complete-case analysis with stepwise variable selection and that SIAS slightly outperforms ITS.  相似文献   

9.
We previously developed an integrated model of the brain within a single cortical area for functional Magnetic Resonance Imaging (fMRI), electroencephalography (EEG), and magnetoencephalography (MEG) using an extended neural mass model (ENMM). We then extended ENMM from a single-area to a multi-area model to develop a neural mass model of the entire brain. To this end, we derived a nonlinear state-space representation of the multi-area model. In Parts I and II of these two companion papers (henceforth called Part I and Part II), we develop and evaluate a variational Bayesian expectation maximization (VBEM) method to estimate parameters of multi-area ENMM (MEN) using E/MEG data. In Part I, we derive a state-space representation of MEN and use VBEM method for model inversion (parameter estimation). We evaluate and validate performance of VBEM method for model inversion of MEN using simulation studies in various signal-to-noise ratios. Details of VBEM method are presented in Part II. The proposed approach provides a useful technique for analyzing effective connectivity using non-invasive EEG and MEG methods.  相似文献   

10.
Linear discriminant analysis (LDA) is a multivariate classification technique frequently applied to morphometric data in various biomedical disciplines. Canonical variate analysis (CVA), the generalization of LDA for multiple groups, is often used in the exploratory style of an ordination technique (a low-dimensional representation of the data). In the rare case when all groups have the same covariance matrix, maximum likelihood classification can be based on these linear functions. Both LDA and CVA require full-rank covariance matrices, which is usually not the case in modern morphometrics. When the number of variables is close to the number of individuals, groups appear separated in a CVA plot even if they are samples from the same population. Hence, reliable classification and assessment of group separation require many more organisms than variables. A simple alternative to CVA is the projection of the data onto the principal components of the group averages (between-group PCA). In contrast to CVA, these axes are orthogonal and can be computed even when the data are not of full rank, such as for Procrustes shape coordinates arising in samples of any size, and when covariance matrices are heterogeneous. In evolutionary quantitative genetics, the selection gradient is identical to the coefficient vector of a linear discriminant function between the populations before vs. after selection. When the measured variables are Procrustes shape coordinates, discriminant functions and selection gradients are vectors in shape space and can be visualized as shape deformations. Except for applications in quantitative genetics and in classification, however, discriminant functions typically offer no interpretation as biological factors.  相似文献   

11.
A novel method for the qualification of reduced scale models (RSMs) was illustrated using data from both a 250-ml advanced microscale bioreactor (ambr) and a 5-L bioreactor RSM for a 2,000-L manufacturing scale process using a CHO cell line to produce a recombinant monoclonal antibody. The example study showed how the method was used to identify process performance attributes and product quality attributes that capture important aspects of the RSM qualification process. The method uses two novel statistical approaches: multivariate dimension reduction and data visualization techniques, via partial least squares discriminant analysis (PLS-DA), and Bayesian multivariate linear modeling for inferential analysis. Bayesian multivariate linear modeling allows for individual probability distributions of the differences of the mean of each attribute for each scale, as well as joint probability statements on the differences of the means for multiple attributes. Depending on the results of this inferential procedure, PLS-DA is used to identify the process performance outputs at the different scales which have the greatest negative impact on the multivariate Bayesian joint probabilities. Experience with that particular process can then be leveraged to adjust operating conditions to minimize these differences, and then equivalence can be reassessed using the multivariate linear model.  相似文献   

12.
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.  相似文献   

13.
Bayesian network models are commonly used to model gene expression data. Some applications require a comparison of the network structure of a set of genes between varying phenotypes. In principle, separately fit models can be directly compared, but it is difficult to assign statistical significance to any observed differences. There would therefore be an advantage to the development of a rigorous hypothesis test for homogeneity of network structure. In this paper, a generalized likelihood ratio test based on Bayesian network models is developed, with significance level estimated using permutation replications. In order to be computationally feasible, a number of algorithms are introduced. First, a method for approximating multivariate distributions due to Chow and Liu (1968) is adapted, permitting the polynomial-time calculation of a maximum likelihood Bayesian network with maximum indegree of one. Second, sequential testing principles are applied to the permutation test, allowing significant reduction of computation time while preserving reported error rates used in multiple testing. The method is applied to gene-set analysis, using two sets of experimental data, and some advantage to a pathway modelling approach to this problem is reported.  相似文献   

14.
Particle classification is an important component of multivariate statistical analysis methods that has been used extensively to extract information from electron micrographs of single particles. Here we describe a new Bayesian Gibbs sampling algorithm for the classification of such images. This algorithm, which is applied after dimension reduction by correspondence analysis or by principal components analysis, dynamically learns the parameters of the multivariate Gaussian distributions that characterize each class. These distributions describe tilted ellipsoidal clusters that adaptively adjust shape to capture differences in the variances of factors and the correlations of factors within classes. A novel Bayesian procedure to objectively select factors for inclusion in the classification models is a component of this procedure. A comparison of this algorithm with hierarchical ascendant classification of simulated data sets shows improved classification over a broad range of signal-to-noise ratios.  相似文献   

15.
In this paper, we introduce a Bayesian statistical model for the analysis of functional data observed at several time points. Examples of such data include the Michigan growth study where we wish to characterize the shape changes of human mandible profiles. The form of the mandible is often used by clinicians as an aid in predicting the mandibular growth. However, whereas many studies have demonstrated the changes in size that may occur during the period of pubertal growth spurt, shape changes have been less well investigated. Considering a group of subjects presenting normal occlusion, in this paper we thus describe a Bayesian functional ANOVA model that provides information about where and when the shape changes of the mandible occur during different stages of development. The model is developed by defining the notion of predictive process models for Gaussian process (GP) distributions used as priors over the random functional effects. We show that the predictive approach is computationally appealing and that it is useful to analyze multivariate functional data with unequally spaced observations that differ among subjects and times. Graphical posterior summaries show that our model is able to provide a biological interpretation of the morphometric findings and that they comprehensively describe the shape changes of the human mandible profiles. Compared with classical cephalometric analysis, this paper represents a significant methodological advance for the study of mandibular shape changes in two dimensions.  相似文献   

16.
17.
In the antisaccade task, subjects are requested to suppress a reflexive saccade towards a visual target and to perform a saccade towards the opposite side. In addition, in order to reproduce an accurate saccadic amplitude, the visual saccade vector (i.e., the distance between a central fixation point and the peripheral target) must be exactly inverted from one visual hemifield to the other. Results from recent studies using a correlational approach (i.e., fMRI, MEG) suggest that not only the posterior parietal cortex (PPC) but also the frontal eye field (FEF) might play an important role in such a visual vector inversion process. In order to assess whether the FEF contributes to visual vector inversion, we applied an interference approach with continuous theta burst stimulation (cTBS) during a memory-guided antisaccade task. In 10 healthy subjects, one train of cTBS was applied over the right FEF prior to a memory-guided antisaccade task. In comparison to the performance without stimulation or with sham stimulation, cTBS over the right FEF induced a hypometric gain for rightward but not leftward antisaccades. These results obtained with an interference approach confirm that the FEF is also involved in the process of visual vector inversion.  相似文献   

18.
Recent experimental advances facilitate the collection of time series data that indicate which genes in a cell are expressed. This information can be used to understand the genetic regulatory network that generates the data. Typically, Bayesian analysis approaches are applied which neglect the time series nature of the experimental data, have difficulty in determining the direction of causality, and do not perform well on networks with tight feedback. To address these problems, this paper presents a method to learn genetic network connectivity which exploits the time series nature of experimental data to achieve better causal predictions. This method first breaks up the data into bins. Next, it determines an initial set of potential influence vectors for each gene based upon the probability of the gene's expression increasing in the next time step. These vectors are then combined to form new vectors with better scores. Finally, these influence vectors are competed against each other to determine the final influence vector for each gene. The result is a directed graph representation of the genetic network's repression and activation connections. Results are reported for several synthetic networks with tight feedback showing significant improvements in recall and runtime over Yu's dynamic Bayesian approach. Promising preliminary results are also reported for an analysis of experimental data for genes involved in the yeast cell cycle.  相似文献   

19.
The 10 species of Galaxias in Tasmania, G. olidus from mainland Australia and the four species of Paragalaxias were studied using principal co-ordinates analysis (PCOA) and cluster analysis of a standardized Euclidean distance matrix based upon variate means, and by canonical variate analysis (CVA) conducted as a stepwise multiple discriminant analysis. Thirty-five variables comprising 30 morphometric and five meristic characters were analysed. The meristic characters were not included in the CVA. Excellent separation of the two genera was achieved in all analyses. The multivariate analyses were repeated on each genus separately to see if relationships suggested by the overall analysis remain stable. When the resultant groupings of species are compared for the different analyses, no consistent, distinct groupings of species within each genus are apparent. Despite the absence of distinct groupings, some trends in the affinities of some species are evident. In particular, species affinities as indicated by the CVA are more consistent with established opinions of species relationships. From the results of the study it is suggested that caution be exercised in the application of multivariate statistical analyses of morphological data to ichthyological systematics and phylogeny.  相似文献   

20.
Clinical electroencephalographic (EEG) recordings of the transition into generalised epileptic seizures show a sudden onset of spike-wave dynamics from a low-amplitude irregular background. In addition, non-trivial and variable spatio-temporal dynamics are widely reported in combined EEG/fMRI studies on the scale of the whole cortex. It is unknown whether these characteristics can be accounted for in a large-scale mathematical model with fixed heterogeneous long-range connectivities. Here, we develop a modelling framework with which to investigate such EEG features. We show that a neural field model composed of a few coupled compartments can serve as a low-dimensional prototype for the transition between irregular background dynamics and spike-wave activity. This prototype then serves as a node in a large-scale network with long-range connectivities derived from human diffusion-tensor imaging data. We examine multivariate properties in 42 clinical EEG seizure recordings from 10 patients diagnosed with typical absence epilepsy and 50 simulated seizures from the large-scale model using 10 DTI connectivity sets from humans. The model can reproduce the clinical feature of stereotypy where seizures are more similar within a patient than between patients, essentially creating a patient-specific fingerprint. We propose the approach as a feasible technique for the investigation of patient-specific large-scale epileptic features in space and time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号