首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Micro-blogging services, such as Twitter, offer opportunities to analyse user behaviour. Discovering and distinguishing behavioural patterns in micro-blogging services is valuable. However, it is difficult and challenging to distinguish users, and to track the temporal development of collective attention within distinct user groups in Twitter. In this paper, we formulate this problem as tracking matrices decomposed by Nonnegative Matrix Factorisation for time-sequential matrix data, and propose a novel extension of Nonnegative Matrix Factorisation, which we refer to as Time Evolving Nonnegative Matrix Factorisation (TENMF). In our method, we describe users and words posted in some time interval by a matrix, and use several matrices as time-sequential data. Subsequently, we apply Time Evolving Nonnegative Matrix Factorisation to these time-sequential matrices. TENMF can decompose time-sequential matrices, and can track the connection among decomposed matrices, whereas previous NMF decomposes a matrix into two lower dimension matrices arbitrarily, which might lose the time-sequential connection. Our proposed method has an adequately good performance on artificial data. Moreover, we present several results and insights from experiments using real data from Twitter.  相似文献   

2.
The goal of this article is to model multisubject task‐induced functional magnetic resonance imaging (fMRI) response among predefined regions of interest (ROIs) of the human brain. Conventional approaches to fMRI analysis only take into account temporal correlations, but do not rigorously model the underlying spatial correlation due to the complexity of estimating and inverting the high dimensional spatio‐temporal covariance matrix. Other spatio‐temporal model approaches estimate the covariance matrix with the assumption of stationary time series, which is not always feasible. To address these limitations, we propose a double‐wavelet approach for modeling the spatio‐temporal brain process. Working with wavelet coefficients simplifies temporal and spatial covariance structure because under regularity conditions, wavelet coefficients are approximately uncorrelated. Different wavelet functions were used to capture different correlation structures in the spatio‐temporal model. The main advantages of the wavelet approach are that it is scalable and that it deals with nonstationarity in brain signals. Simulation studies showed that our method could reduce false‐positive and false‐negative rates by taking into account spatial and temporal correlations simultaneously. We also applied our method to fMRI data to study activation in prespecified ROIs in the prefontal cortex. Data analysis showed that the result using the double‐wavelet approach was more consistent than the conventional approach when sample size decreased.  相似文献   

3.
Influenza viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. Antigenic characterization based on hemagglutination inhibition (HI) assay is one of the routine procedures for influenza vaccine strain selection. However, HI assay is only a crude experiment reflecting the antigenic correlations among testing antigens (viruses) and reference antisera (antibodies). Moreover, antigenic characterization is usually based on more than one HI dataset. The combination of multiple datasets results in an incomplete HI matrix with many unobserved entries. This paper proposes a new computational framework for constructing an influenza antigenic cartography from this incomplete matrix, which we refer to as Matrix Completion-Multidimensional Scaling (MC-MDS). In this approach, we first reconstruct the HI matrices with viruses and antibodies using low-rank matrix completion, and then generate the two-dimensional antigenic cartography using multidimensional scaling. Moreover, for influenza HI tables with herd immunity effect (such as those from Human influenza viruses), we propose a temporal model to reduce the inherent temporal bias of HI tables caused by herd immunity. By applying our method in HI datasets containing H3N2 influenza A viruses isolated from 1968 to 2003, we identified eleven clusters of antigenic variants, representing all major antigenic drift events in these 36 years. Our results showed that both the completed HI matrix and the antigenic cartography obtained via MC-MDS are useful in identifying influenza antigenic variants and thus can be used to facilitate influenza vaccine strain selection. The webserver is available at http://sysbio.cvm.msstate.edu/AntigenMap.  相似文献   

4.
We deal here with the issue of complex network evolution. The analysis of topological evolution of complex networks plays a crucial role in predicting their future. While an impressive amount of work has been done on the issue, very little attention has been so far devoted to the investigation of how information theory quantifiers can be applied to characterize networks evolution. With the objective of dynamically capture the topological changes of a network''s evolution, we propose a model able to quantify and reproduce several characteristics of a given network, by using the square root of the Jensen-Shannon divergence in combination with the mean degree and the clustering coefficient. To support our hypothesis, we test the model by copying the evolution of well-known models and real systems. The results show that the methodology was able to mimic the test-networks. By using this copycat model, the user is able to analyze the networks behavior over time, and also to conjecture about the main drivers of its evolution, also providing a framework to predict its evolution.  相似文献   

5.
Matrix population models are one of the most common mathematical models in ecology, which describe the dynamics of stage-structured populations and provide us many population statistics. One of the statistics, elasticity onto population growth rate, is frequently used and represents the degree of the relative impact of life history parameters to the population growth rate. Due to the utility of elasticities for cross-taxonomic comparisons, Silvertown and his coauthors have published multiple papers and reported the relationship between elasticities and life forms (or life history) in multiple plant species, using a triangle map (called “ternary plot”). To understand why their elasticities are located in specific regions of the ternary plot, we constructed four archetypes of population matrices, from which we simulated 24,000 randomly generated population matrices and obtained the consequent elasticities. We found a large discrepancy when comparing our results to those in Silvertown et al.'s study (Conserv Biol 10:591–597, 1996): for our simulated matrices where rapid transitions were not allowed (e.g., trees), the elasticity distribution resulted in a line across the ternary plot. We provided the mathematical proof for this result, and found that its slope depends on matrix dimension. We also used 1230 matrices from the COMPADRE Plant Matrix Database and calculated the elasticities. Our simulated results were validated with field data from COMPADRE: two straight lines appeared in the ternary plot. Furthermore, we answered several addressed questions, such as, “Is there any special elasticity distribution in matrices with high population growth rates?” and “Why are the elasticities of natural populations concentrated in the upper half of the ternary plot?”.  相似文献   

6.
Many evolutionary processes can lead to a change in the correlation between continuous characters over time or on different branches of a phylogenetic tree. Shifts in genetic or functional constraint, in the selective regime, or in some combination thereof can influence both the evolution of continuous traits and their relation to each other. These changes can often be mapped on a phylogenetic tree to examine their influence on multivariate phenotypic diversification. We propose a new likelihood method to fit multiple evolutionary rate matrices (also called evolutionary variance–covariance matrices) to species data for two or more continuous characters and a phylogeny. The evolutionary rate matrix is a matrix containing the evolutionary rates for individual characters on its diagonal, and the covariances between characters (of which the evolutionary correlations are a function) elsewhere. To illustrate our approach, we apply the method to an empirical dataset consisting of two features of feeding morphology sampled from 28 centrarchid fish species, as well as to data generated via phylogenetic numerical simulations. We find that the method has appropriate type I error, power, and parameter estimation. The approach presented herein is the first to allow for the explicit testing of how and when the evolutionary covariances between characters have changed in the history of a group.  相似文献   

7.
Finite mixture models can provide the insights about behavioral patterns as a source of heterogeneity of the various dynamics of time course gene expression data by reducing the high dimensionality and making clear the major components of the underlying structure of the data in terms of the unobservable latent variables. The latent structure of the dynamic transition process of gene expression changes over time can be represented by Markov processes. This paper addresses key problems in the analysis of large gene expression data sets that describe systemic temporal response cascades and dynamic changes to therapeutic doses in multiple tissues, such as liver, skeletal muscle, and kidney from the same animals. Bayesian Finite Markov Mixture Model with a Dirichlet Prior is developed for the identifications of differentially expressed time related genes and dynamic clusters. Deviance information criterion is applied to determine the number of components for model comparisons and selections. The proposed Bayesian models are applied to multiple tissue polygenetic temporal gene expression data and compared to a Bayesian model‐based clustering method, named CAGED. Results show that our proposed Bayesian Finite Markov Mixture model can well capture the dynamic changes and patterns for irregular complex temporal data (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

8.
9.
A structural approach to temporality in distributional data for use in palaeobiogeography is described herein. Pre-established areas in the distributional data matrix are split temporally, allowing a single geographical space to have multiple iterations [e.g. Area A (Lower Devonian), Area A (Middle Devonian)]. The resulting temporal matrix will allow the representation and capture of any differing relationships through time. Designed primarily for Parsimony Analysis of Endemicity (PAE) and biotic similarity analyses, this approach simply structures distributional data within a temporal partition, meaning that numerical methods can be used to assess relationships between areas to find a branching diagram. Created through the application of the temporal matrix to a given analysis, Temporal Area Approach (TAAp) is a structural approach that facilitates exploration of the data rather than being a hypothesis-driven model following analysis. Understanding the behaviour of non-phylogenetic palaeobiogeographical data and reducing the prevalence of temporal artefacts will lead to more robust area classifications.  相似文献   

10.
11.
MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

12.
Matrix population models, elasticity analysis and loop analysis can potentially provide powerful techniques for the analysis of life histories. Data from a capture–recapture study on a population of southern highland water skinks (Eulamprus tympanum) were used to construct a matrix population model. Errors in elasticities were calculated by using the parametric bootstrap technique. Elasticity and loop analyses were then conducted to identify the life history stages most important to fitness. The same techniques were used to investigate the relative importance of fast versus slow growth, and rapid versus delayed reproduction. Mature water skinks were long‐lived, but there was high immature mortality. The most sensitive life history stage was the subadult stage. It is suggested that life history evolution in E. tympanum may be strongly affected by predation, particularly by birds. Because our population declined over the study, slow growth and delayed reproduction were the optimal life history strategies over this period. Although the techniques of evolutionary demography provide a powerful approach for the analysis of life histories, there are formidable logistical obstacles in gathering enough high‐quality data for robust estimates of the critical parameters.  相似文献   

13.
User preference plays a prominent role in many fields, including electronic commerce, social opinion, and Internet search engines. Particularly in recommender systems, it directly influences the accuracy of the recommendation. Though many methods have been presented, most of these have only focused on how to improve the recommendation results. In this paper, we introduce an empirical study of user preferences based on a set of rating data about movies. We develop a simple statistical method to investigate the characteristics of user preferences. We find that the movies have potential characteristics of closure, which results in the formation of numerous cliques with a power-law size distribution. We also find that a user related to a small clique always has similar opinions on the movies in this clique. Then, we suggest a user preference model, which can eliminate the predictions that are considered to be impracticable. Numerical results show that the model can reflect user preference with remarkable accuracy when data elimination is allowed, and random factors in the rating data make prediction error inevitable. In further research, we will investigate many other rating data sets to examine the universality of our findings.  相似文献   

14.
Knowledge of the genetic variances and covariances of traits (the G ‐matrix) is fundamental for the understanding of evolutionary dynamics of populations. Despite its essential importance in evolutionary studies, empirical tests of the temporal stability of the G ‐matrix in natural populations are few. We used a 25‐year‐long individual‐based field study on almost 7000 breeding attempts of the collared flycatcher (Ficedula albicollis) to estimate the stability of the G‐matrix over time. Using animal models to estimate G for several time periods, we show that the structure of the time‐specific G‐matrices changed significantly over time. The temporal changes in the G‐matrix were unpredictable, and the structure at one time period was not indicative of the structure at the next time period. Moreover, we show that the changes in the time‐specific G‐matrices were not related to changes in mean trait values or due to genetic drift. Selection, differences in acquisition/allocation patterns or environment‐dependent allelic effects are therefore likely explanations for the patterns observed, probably in combination. Our result cautions against assuming constancy of the G ‐matrix and indicates that even short‐term evolutionary predictions in natural populations can be very challenging.  相似文献   

15.
Advances in molecular “omics” technologies have motivated new methodologies for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose bidimensional integrative factorization (BIDIFAC) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes data into (a) globally shared, (b) row-shared, (c) column-shared, and (d) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation, we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from the random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (messenger RNA and microRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from the Cancer Genome Atlas. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.  相似文献   

16.
On gene ranking using replicated microarray time course data   总被引:1,自引:0,他引:1  
Tai YC  Speed TP 《Biometrics》2009,65(1):40-51
Summary .  Consider the ranking of genes using data from replicated microarray time course experiments, where there are multiple biological conditions, and the genes of interest are those whose temporal profiles differ across conditions. We derive a multisample multivariate empirical Bayes' statistic for ranking genes in the order of differential expression, from both longitudinal and cross-sectional replicated developmental microarray time course data. Our longitudinal multisample model assumes that time course replicates are independent and identically distributed multivariate normal vectors. On the other hand, we construct a cross-sectional model using a normal regression framework with any appropriate basis for the design matrices. In both cases, we use natural conjugate priors in our empirical Bayes' setting which guarantee closed form solutions for the posterior odds. The simulations and two case studies using published worm and mouse microarray time course datasets indicate that the proposed approaches perform satisfactorily.  相似文献   

17.
Phylogenies are fundamental to comparative biology as they help to identify independent events on which statistical tests rely. Two groups of phylogenetic comparative methods (PCMs) can be distinguished: those that take phylogenies into account by introducing explicit models of evolution and those that only consider phylogenies as a statistical constraint and aim at partitioning trait values into a phylogenetic component (phylogenetic inertia) and one or multiple specific components related to adaptive evolution. The way phylogenetic information is incorporated into the PCMs depends on the method used. For the first group of methods, phylogenies are converted into variance-covariance matrices of traits following a given model of evolution such as Brownian motion (BM). For the second group of methods, phylogenies are converted into distance matrices that are subsequently transformed into Euclidean distances to perform principal coordinate analyses. Here, we show that simply taking the elementwise square root of a distance matrix extracted from a phylogenetic tree ensures having a Euclidean distance matrix. This is true for any type of distances between species (patristic or nodal) and also for trees harboring multifurcating nodes. Moreover, we illustrate that this simple transformation using the square root imposes less geometric distortion than more complex transformations classically used in the literature such as the Cailliez method. Given the Euclidean nature of the elementwise square root of phylogenetic distance matrices, the positive semidefinitiveness of the phylogenetic variance-covariance matrix of a trait following a BM model, or related models of trait evolution, can be established. In that way, we build a bridge between the two groups of statistical methods widely used in comparative analysis. These results should be of great interest for ecologists and evolutionary biologists performing statistical analyses incorporating phylogenies.  相似文献   

18.
Understanding drivers of temporal variation in demographic parameters is a central goal of mark-recapture analysis. To estimate the survival of migrating animal populations in migration corridors, space-for-time mark–recapture models employ discrete sampling locations in space to monitor marked populations as they move past monitoring sites, rather than the standard practice of using fixed sampling points in time. Because these models focus on estimating survival over discrete spatial segments, model parameters are implicitly integrated over the temporal dimension. Furthermore, modeling the effect of time-varying covariates on model parameters is complicated by unknown passage times for individuals that are not detected at monitoring sites. To overcome these limitations, we extended the Cormack–Jolly–Seber (CJS) framework to estimate temporally stratified survival and capture probabilities by including a discretized arrival time process in a Bayesian framework. We allow for flexibility in the model form by including temporally stratified covariates and hierarchical structures. In addition, we provide tools for assessing model fit and comparing among alternative structural models for the parameters. We demonstrate our framework by fitting three competing models to estimate daily survival, capture, and arrival probabilities at four hydroelectric dams for over 200 000 individually tagged migratory juvenile salmon released into the Snake River, USA.  相似文献   

19.
I explore the use of multiple regression on distance matrices (MRM), an extension of partial Mantel analysis, in spatial analysis of ecological data. MRM involves a multiple regression of a response matrix on any number of explanatory matrices, where each matrix contains distances or similarities (in terms of ecological, spatial, or other attributes) between all pair-wise combinations of n objects (sample units); tests of statistical significance are performed by permutation. The method is flexible in terms of the types of data that may be analyzed (counts, presence–absence, continuous, categorical) and the shapes of response curves. MRM offers several advantages over traditional partial Mantel analysis: (1) separating environmental distances into distinct distance matrices allows inferences to be made at the level of individual variables; (2) nonparametric or nonlinear multiple regression methods may be employed; and (3) spatial autocorrelation may be quantified and tested at different spatial scales using a series of lag matrices, each representing a geographic distance class. The MRM lag matrices model may be parameterized to yield very similar inferences regarding spatial autocorrelation as the Mantel correlogram. Unlike the correlogram, however, the lag matrices model may also include environmental distance matrices, so that spatial patterns in species abundance distances (community similarity) may be quantified while controlling for the environmental similarity between sites. Examples of spatial analyses with MRM are presented.  相似文献   

20.
This issue of Matrix Biology commemorates the memory of Dick Heinegård and his exceptional contributions to identify extracellular matrix molecules and their interactions that form cartilage matrices. This tribute to him demonstrates the development of his cartilage matrix model and how this model relates to the articles in this Matrix Biology Cartilage issue.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号