首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary Using a new type of array technology, the reverse phase protein array (RPPA), we measure time-course protein expression for a set of selected markers that are known to coregulate biological functions in a pathway structure. To accommodate the complex dependent nature of the data, including temporal correlation and pathway dependence for the protein markers, we propose a mixed effects model with temporal and protein-specific components. We develop a sequence of random probability measures (RPM) to account for the dependence in time of the protein expression measurements. Marginally, for each RPM we assume a Dirichlet process model. The dependence is introduced by defining multivariate beta distributions for the unnormalized weights of the stick-breaking representation. We also acknowledge the pathway dependence among proteins via a conditionally autoregressive model. Applying our model to the RPPA data, we reveal a pathway-dependent functional profile for the set of proteins as well as marginal expression profiles over time for individual markers.  相似文献   

2.
Generalized Spatial Dirichlet Process Models   总被引:1,自引:0,他引:1  
Many models for the study of point-referenced data explicitlyintroduce spatial random effects to capture residual spatialassociation. These spatial effects are customarily modelledas a zero-mean stationary Gaussian process. The spatial Dirichletprocess introduced by Gelfand et al. (2005) produces a randomspatial process which is neither Gaussian nor stationary. Rather,it varies about a process that is assumed to be stationary andGaussian. The spatial Dirichlet process arises as a probability-weightedcollection of random surfaces. This can be limiting for modellingand inferential purposes since it insists that a process realizationmust be one of these surfaces. We introduce a random distributionfor the spatial effects that allows different surface selectionat different sites. Moreover, we can specify the model so thatthe marginal distribution of the effect at each site still comesfrom a Dirichlet process. The development is offered constructively,providing a multivariate extension of the stick-breaking representationof the weights. We then introduce mixing using this generalizedspatial Dirichlet process. We illustrate with a simulated datasetof independent replications and note that we can embed the generalizedprocess within a dynamic model specification to eliminate theindependence assumption.  相似文献   

3.
We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real‐data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data.  相似文献   

4.
The fixation of advantageous mutations in a population has the effect of reducing variation in the DNA sequence near that mutation. Kaplan et al. (1989) used a three-phase simulation model to study the effect of selective sweeps on genealogies. However, most subsequent work has simplified their approach by assuming that the number of individuals with the advantageous allele follows the logistic differential equation. We show that the impact of a selective sweep can be accurately approximated by a random partition created by a stick-breaking process. Our simulation results show that ignoring the randomness when the number of individuals with the advantageous allele is small can lead to substantial errors.  相似文献   

5.
One way to describe the spread of an infection on a network is by approximating the network by a random graph. However, the usual way of constructing a random graph does not give any control over the number of triangles in the graph, while these triangles will naturally arise in many networks (e.g. in social networks). In this paper, random graphs with a given degree distribution and a given expected number of triangles are constructed. By using these random graphs we analyze the spread of two types of infection on a network: infections with a fixed infectious period and infections for which an infective individual will infect all of its susceptible neighbors or none. These two types of infection can be used to give upper and lower bounds for R(0), the probability of extinction and other measures of dynamics of infections with more general infectious periods.  相似文献   

6.
Comparisons between mass-action or “random” network models and empirical networks have produced mixed results. Here we seek to discover whether a simulated disease spread through randomly constructed networks can be coerced to model the spread in empirical networks by altering a single disease parameter — the probability of infection. A stochastic model for disease spread through herds of cattle is utilised to model the passage of an SEIR (susceptible–latent–infected–resistant) through five networks. The first network is an empirical network of recorded contacts, from four datasets available, and the other four networks are constructed from randomly distributed contacts based on increasing amounts of information from the recorded network. A numerical study on adjusting the value of the probability of infection was conducted for the four random network models. We found that relative percentage reductions in the probability of infection, between 5.6% and 39.4% in the random network models, produced results that most closely mirrored the results from the empirical contact networks. In all cases tested, to reduce the differences between the two models, required a reduction in the probability of infection in the random network.  相似文献   

7.
In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.  相似文献   

8.
Recreational travel is a recognized vector for the spread of invasive species in North America. However, there has been little quantitative analysis of the risks posed by such travel and the associated transport of firewood. In this study, we analyzed the risk of forest insect spread with firewood and estimated related dispersal parameters for application in geographically explicit invasion models. Our primary data source was the U.S. National Recreation Reservation Service database, which records camper reservations at > 2,500 locations nationwide. For > 7 million individual reservations made between 2004 and 2009 (including visits from Canada), we calculated the distance between visitor home address and campground location. We constructed an empirical dispersal kernel (i.e., the probability distribution of the travel distances) from these "origin-destination" data, and then fitted the data with various theoretical distributions. We found the data to be strongly leptokurtic (fat-tailed) and fairly well fit by the unbounded Johnson and lognormal distributions. Most campers ( approximately 53%) traveled <100 km, but approximately 10% traveled > 500 km (and as far as 5,500 km). Additionally, we examined the impact of geographic region, specific destinations (major national parks), and specific origin locations (major cities) on the shape of the dispersal kernel, and found that mixture distributions (i.e., theoretical distribution functions composed of multiple univariate distributions) may fit better in some circumstances. Although only a limited amount of all transported firewood is likely to be infested by forest insects, this still represents a considerable increase in dispersal potential beyond the insects' natural spread capabilities.  相似文献   

9.
Parametric and nonparametric kernel methods dominate studies of animal home ranges and space use. Most existing methods are unable to incorporate information about the underlying physical environment, leading to poor performance in excluding areas that are not used. Using radio-telemetry data from sea otters, we developed and evaluated a new algorithm for estimating home ranges (hereafter Permissible Home Range Estimation, or “PHRE”) that reflects habitat suitability. We began by transforming sighting locations into relevant landscape features (for sea otters, coastal position and distance from shore). Then, we generated a bivariate kernel probability density function in landscape space and back-transformed this to geographic space in order to define a permissible home range. Compared to two commonly used home range estimation methods, kernel densities and local convex hulls, PHRE better excluded unused areas and required a smaller sample size. Our PHRE method is applicable to species whose ranges are restricted by complex physical boundaries or environmental gradients and will improve understanding of habitat-use requirements and, ultimately, aid in conservation efforts.  相似文献   

10.
This paper is concerned with a class of population growth processes in discrete time; the simple epidemic process is considered as a specific example. A Markov chain model is constructed and standard Markov methods are used to study the main biological concepts. A simple and explicit formula is obtained for the transient distribution of the population size. Then, the cost of the process is defined and the joint probability generating function of its components is derived. Finally, the results are extended to the case where the inter-transition periods are bounded i.i.d. random variables.  相似文献   

11.
B F Manly 《Biometrics》1983,39(1):13-27
A correlation between the distribution of an organism and features of its environment can be taken as indirect evidence of natural selection. Biologists may therefore collect samples from polymorphic populations at a number of locations, classify the locations into habitat types, and consider whether the distribution of morphs varies with the habitat. Statistical aspects of this type of study are discussed in this paper. A randomization test for habitat effects is proposed and a negative binomial model is suggested for the distribution of morphs from random locations within one type of habitat. Data on the distribution of Cepaea hortensis and C. nemoralis snails in southern England provide an example. For both species there is clear evidence of differences between habitats, although the morph distributions are rather variable within habitats. The negative binomial model suggests that, for the snail data, variation in morph proportions is mainly due to location differences. The binomial sampling error is relatively unimportant unless the sample size at a location is very small. Therefore it is reasonable to analyse morph proportions by standard methods without giving different weights to data from different locations. The snail data are analysed in this way. Discriminant function analyses are used to test for habitat effects. The relationships between C. hortensis and C. nemoralis morph frequencies within one habitat are examined by a canonical correlation analysis.  相似文献   

12.
13.
In this paper we present a parallel artificial cortical network inspired by the Human visual system, which enhances the salient contours of an image. The network consists of independent processing elements, which are organized into hypercolumns. They process concurrently the distinct orientations of all the edges of the image. These processing elements are a new set of orientation kernels appropriate for the discrete lattice of the hypercolumns. The Gestalt laws of proximity and continuity that describe the process of saliency extraction in the human brain are encoded by means of weights. These weights interconnect the kernels according to a novel connection pattern based on co-exponentiality. The output of every kernel is modulated by the outputs of its neighboring kernels, according to a new affinity function. This function takes into account the degree of difference between the facilitation of the two lobes of the kernel. Saliency enhancement results as a consequence of the local interactions between the kernels. The network was tested on real and synthetic images and displays promising results for both. Comparisons with other methods with the same scope, demonstrate that the proposed method performs adequately. Furthermore it exhibits O(N) complexity with execution times that have never been reported by any other method so far, even though it is executed on a conventional PC  相似文献   

14.
We study the problem of estimating the density of a random variable G, given observations of a random variable Y = G + E. The random variable E is independent of G and its probability distribution function is considered as known. We build a family of estimators of the density of G using characteristic functions. We then derive a family of estimators of the density of Y based on the model for Y. The estimators are shown to be asymptotically unbiased and consistent. Simulations show that these estimators are better, as measured by integrated squared error, than the standard kernel estimators. Finally, we give an example of the use of this method for the detection of major genes in animal populations.  相似文献   

15.
The analysis of animal movement within different landscapes may increase our understanding of how landscape features affect the perceptual range of animals. Perceptual range is linked to movement probability of an animal via a dispersal kernel, the latter being generally considered as spatially invariant but could be spatially affected. We hypothesize that spatial plasticity of an animal''s dispersal kernel could greatly modify its distribution in time and space. After radio tracking the movements of walking insects (Cosmopolites sordidus) in banana plantations, we considered the movements of individuals as states of a Markov chain whose transition probabilities depended on the habitat characteristics of current and target locations. Combining a likelihood procedure and pattern-oriented modelling, we tested the hypothesis that dispersal kernel depended on habitat features. Our results were consistent with the concept that animal dispersal kernel depends on habitat features. Recognizing the plasticity of animal movement probabilities will provide insight into landscape-level ecological processes.  相似文献   

16.
Wu MC  Follmann DA 《Biometrics》1999,55(1):75-84
We discuss how to apply the conditional informative missing model of Wu and Bailey (1989, Biometrics 45, 939-955) to the setting where the probability of missing a visit depends on the random effects of the primary response in a time-dependent fashion. This includes the case where the probability of missing a visit depends on the true value of the primary response. Summary measures for missingness that are weighted sums of the indicators of missed visits are derived for these situations. These summary measures are then incorporated as covariates in a random effects model for the primary response. This approach is illustrated by analyzing data collected from a trial of heroin addicts where missed visits are informative about drug test results. Simulations of realistic experiments indicate that these time-dependent summary measures also work well under a variety of informative censoring models. These summary measures can achieve large reductions in estimation bias and mean squared errors relative to those obtained by using other summary measures.  相似文献   

17.
Doguwa (1990) proposed kernel based pair-correlation function estimators for Point processes. These estimators transport a part of the probability mass into the negative or left, to a hard-core distance. The reflection technique is used to provide an alternative estimator of the pair-correlation function, which drastically reduces the bias inherent in these estimators for the case of random and clustered point patterns. However, the drastic reduction in bias, is at the cost of a much larger variance.  相似文献   

18.
Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections.  相似文献   

19.
MOTIVATION: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes. AVAILABILITY: The programs for computing the various kernel functions are available on request from the authors.  相似文献   

20.
A diagnostic cut‐off point of a biomarker measurement is needed for classifying a random subject to be either diseased or healthy. However, the cut‐off point is usually unknown and needs to be estimated by some optimization criteria. One important criterion is the Youden index, which has been widely adopted in practice. The Youden index, which is defined as the maximum of (sensitivity + specificity ?1), directly measures the largest total diagnostic accuracy a biomarker can achieve. Therefore, it is desirable to estimate the optimal cut‐off point associated with the Youden index. Sometimes, taking the actual measurements of a biomarker is very difficult and expensive, while ranking them without the actual measurement can be relatively easy. In such cases, ranked set sampling can give more precise estimation than simple random sampling, as ranked set samples are more likely to span the full range of the population. In this study, kernel density estimation is utilized to numerically solve for an estimate of the optimal cut‐off point. The asymptotic distributions of the kernel estimators based on two sampling schemes are derived analytically and we prove that the estimators based on ranked set sampling are relatively more efficient than that of simple random sampling and both estimators are asymptotically unbiased. Furthermore, the asymptotic confidence intervals are derived. Intensive simulations are carried out to compare the proposed method using ranked set sampling with simple random sampling, with the proposed method outperforming simple random sampling in all cases. A real data set is analyzed for illustrating the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号