首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
以555份芒(Miscanthus sinensis)种质资源为研究对象,根据26个表型性状数据,按地理来源、植物区系和单一性状进行分组,分别采用简单比例法、平方根法和多样性指数法确定组内取样数,再根据聚类和随机2种方法进行组内个体选择。依照上述方案共构建出19个具有代表性的芒初选核心种质样本库。通过平均相似系数、性状符合度、数量性状变异系数和遗传多样性指数等4项检测指标对上述19种构建方案进行比较,最终确定了按"植物区划分组+多样性指数确定取样数+聚类选择个体"为芒初级核心种质构建的最佳方案。通过此方法建立起的芒初级核心种质资源共83份,占总资源的14.95%,且新构建的初级种质资源与总资源性状符合度达到100%。  相似文献   

2.
A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear‐mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm‐based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.  相似文献   

3.
MOTIVATION: There is a growing interest in extracting statistical patterns from gene expression time-series data, in which a key challenge is the development of stable and accurate probabilistic models. Currently popular models, however, would be computationally prohibitive unless some independence assumptions are made to describe large-scale data. We propose an unsupervised conditional random fields (CRF) model to overcome this problem by progressively infusing information into the labelling process through a small variable voting pool. RESULTS: An unsupervised CRF model is proposed for efficient analysis of gene expression time series and is successfully applied to gene class discovery and class prediction. The proposed model treats each time series as a random field and assigns an optimal cluster label to each time series, so as to partition the time series into clusters without a priori knowledge about the number of clusters and the initial centroids. Another advantage of the proposed method is the relaxation of independence assumptions.  相似文献   

4.
Naturally occurring strains of Candida albicans are opportunistic pathogens that lack a sexual cycle and that are usually diploids with eight pairs of chromosomes. C. albicans spontaneously gives rise to a high frequency of colonial morphology mutants with altered electrophoretic karyotypes, involving one or more of their chromosomes. However, the most frequent changes involve chromosome VIII, which contains the genes coding for ribosomal DNA (rDNA) units. We have used restriction fragment lengths to analyze the number and physical array of the rDNA units on chromosome VIII in four normal clinical strains and seven morphological mutants derived spontaneously from one of the clinical isolates. HindIII does not cleave the rDNA repeats and liberates the tandem rDNA cluster from each homolog of chromosome VIII as a single fragment, whereas the cleavage at a single site by NotI reveals the size of the single rDNA unit. All clinical strains and morphological mutants differed greatly in the number of rDNA units per cluster and per cell. The four clinical isolates differed additionally among themselves by the size of the single rDNA unit. For a total of 25 chromosome VIII homologs in a total of 11 strains considered, the variability of chromosome VIII was exclusively due to the length of rDNA clusters (or the number of rDNA units) in approximately 92% of the cases, whereas the others involved other rearrangements of chromosome VIII. Only slight variations in the number of rDNA units were observed among 10 random C. albicans subclones and 10 random Saccharomyces cerevisiae subclones grown for a prolonged time at 22 degrees C. However, when grown faster at optimal temperatures of 37 and 30 degrees C, respectively, both fungi accumulated higher numbers of rDNA units, suggesting that this condition is selected for in rapidly growing cells. The morphological mutants, in comparison with the C. albicans subclones, contained a markedly wider distribution of the number of rDNA units, suggesting that a distinct process may be involved in altering the number of rDNA units in these mutants.  相似文献   

5.
We propose an algorithm that builds and maintains clusters over a network subject to mobility. This algorithm is fully decentralized and makes all the different clusters grow concurrently. The algorithm uses circulating tokens that collect data and move according to a random walk traversal scheme. Their task consists in (i) creating a cluster with the nodes it discovers and (ii) managing the cluster expansion; all decisions affecting the cluster are taken only by a node that owns the token. The size of each cluster is maintained higher than m nodes (m is a parameter of the algorithm). The obtained clustering is locally optimal in the sense that, with only a local view of each clusters, it computes the largest possible number of clusters (i.e. the sizes of the clusters are as close to m as possible). This algorithm is designed as a decentralized control algorithm for large scale networks and is mobility-adaptive: after a series of topological changes, the algorithm converges to a clustering. This recomputation only affects nodes in clusters where topological changes happened, and in adjacent clusters.  相似文献   

6.
Gong Y  Gu S  Woodruff RC 《Human heredity》2005,60(3):150-155
Based on the hypothesis that rare alleles are in mutation and random loss equilibrium, mutation rate can be indirectly estimated by measuring the number of rare variants and the average existing time of a mutant allele. This method can be applied to estimate the mutation rate in humans. However, this estimation of mutation rate is affected by the presence of premeiotic clusters of mutation. Mutation clusters change both the number of initial mutants and the average existing time of a mutant allele. As a result, the formula indirectly estimating mutation rate should be modified. The influence of premeiotic clusters is more obvious when the population size is small or the average cluster size is big. For example, if the population size is 3,000 and average cluster size is two, instead of one, the mutation rate is increased by about 9.4%.  相似文献   

7.
This paper studied the cluster synchronization of directed complex networks with time delays. It is different from undirected networks, the coupling configuration matrix of directed networks cannot be assumed as symmetric or irreducible. In order to achieve cluster synchronization, this paper uses an adaptive controller on each node and an adaptive feedback strategy on the nodes which in-degree is zero. Numerical example is provided to show the effectiveness of main theory. This method is also effective when the number of clusters is unknown. Thus, it can be used in the community recognizing of directed complex networks.  相似文献   

8.
Jia YH  Liu XP  Feng YC  Hu CQ 《AAPS PharmSciTech》2011,12(2):738-745
The purpose of this article is to propose an empirical solution to the problem of how many clusters of complex samples should be selected to construct the training set for a universal near infrared quantitative model based on the Næs method. The sample spectra were hierarchically classified into clusters by Ward’s algorithm and Euclidean distance. If the sample spectra were classified into two clusters, the 1/50 of the largest Heterogeneity value in the cluster with larger variation was set as the threshold to determine the total number of clusters. One sample was then randomly selected from each cluster to construct the training set, and the number of samples in training set equaled the number of clusters. In this study, 98 batches of rifampicin capsules with API contents ranging from 50.1% to 99.4% were studied with this strategy. The root mean square errors of cross validation and prediction were 2.54% and 2.31% for the model for rifampicin capsules, respectively. Then, we evaluated this model in terms of outlier diagnostics, accuracy, precision, and robustness. We also used the strategy of training set sample selection to revalidate the models for cefradine capsules, roxithromycin tablets, and erythromycin ethylsuccinate tablets, and the results were satisfactory. In conclusion, all results showed that this training set sample selection strategy assisted in the quick and accurate construction of quantitative models using near-infrared spectroscopy.  相似文献   

9.
MOTIVATION: Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster formation with maximum resolution on the edge of randomness. RESULTS: Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the dataset. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire dataset allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices.  相似文献   

10.
In the last decade, numerous efforts have been devoted to design efficient algorithms for clustering the wireless mobile ad-hoc networks (MANET) considering the network mobility characteristics. However, in existing algorithms, it is assumed that the mobility parameters of the networks are fixed, while they are stochastic and vary with time indeed. Therefore, the proposed clustering algorithms do not scale well in realistic MANETs, where the mobility parameters of the hosts freely and randomly change at any time. Finding the optimal solution to the cluster formation problem is incredibly difficult, if we assume that the movement direction and mobility speed of the hosts are random variables. This becomes harder when the probability distribution function of these random variables is assumed to be unknown. In this paper, we propose a learning automata-based weighted cluster formation algorithm called MCFA in which the mobility parameters of the hosts are assumed to be random variables with unknown distributions. In the proposed clustering algorithm, the expected relative mobility of each host with respect to all its neighbors is estimated by sampling its mobility parameters in various epochs. MCFA is a fully distributed algorithm in which each mobile independently chooses the neighboring host with the minimum expected relative mobility as its cluster-head. This is done based solely on the local information each host receives from its neighbors and the hosts need not to be synchronized. The experimental results show the superiority of MCFA over the best existing mobility-based clustering algorithms in terms of the number of clusters, cluster lifetime, reaffiliation rate, and control message overhead.  相似文献   

11.
Richard F. Green 《Oikos》2006,112(2):274-284
Oaten's (1977) stochastic model for optimal foraging in patches has been solved for a number of particular cases. A few cases, such as Poisson prey distribution and either systematic or random search, are easy to solve. In other cases, such as binomial prey distribution and random search, the form of the optimal strategy may be found using a theorem of McNamara, although more work is required to find which particular rule of the proper form is actually best. More generally (but not completely generally), optimal strategies may be found using dynamic programming. This requires that the number of prey found up to a particular time is a sufficient statistic for the number of prey remaining in a patch. This requirement cannot be dispensed with, but other simplifying assumptions that were used in the past are not necessary. In particular, it is not necessary, even for the sake of convenience, to assume that prey distribution has a form convenient for Bayesian analysis, such as a beta mixture of binomials or a gamma mixture of Poissons. Any prey distribution may be used if whatever prey are in a patch are located at random, and if search either is systematic for discrete time or for continuous time, or is random for continuous time. In earlier work, some pains had to be taken to find the rate of finding prey achieved by a given candidate strategy, but this is not necessary if expected gains and expected times are calculated routinely for each potential stopping point during dynamic programming. A new, simple method of finding optimal strategies is illustrated for discrete time and systematic search. This paper is based on a talk given at the Fifth Hans Kristiansson Symposium held in Lund, Sweden in August, 2003. The subject of the symposium was Bayesian foraging.  相似文献   

12.
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.  相似文献   

13.
A method is introduced to compare results of a clustering technique at different levels of abstraction, or of different clustering techniques. The method emphasizes within cluster homogeneity as well as discontinuities between clusters. It has been derived from Hogeweg's method with some important changes. First each cluster is handled separately to determine the ratio between homogeneity and similarity to the nearest neighbour cluster. For a given clustering a weighted average value is computed over all clusters. This average value is standardized using an expected average value for a cluster configuration with the same number of clusters having the same sizes. A low level of the ratio between expected and observed values is supposed to indicate an optimal clustering. A derivation of the criterion is given and results from three sets of data with different properties are evaluated.  相似文献   

14.
Models have shown that population cycles might be driven by time lags resulting from positive feedback between kin structure and population change, coupled with negative feedback between density and population change. One such model operates through kin favouritism facilitating the recruitment of young cock red grouse. We investigated whether recruitment by young cocks depended on the presence and spatial arrangement of elder relatives in the territorial population. We used molecular genetic estimates of relatedness, and checked for effects of covariates including natal territory size, hatching date, body size, parasite burdens and local density. Philopatric recruitment by cock red grouse led to the formation of clusters of contiguous territories owned by kin. The probability that an individual young cock would establish a territory increased with the number of kin in his father's cluster. This pattern might have been due to genetic quality determining both recruitment success and the size of the paternal cluster. If so, there should have been a positive correlation between a young cock's probability of recruitment and the number of his relatives in the population, irrespective of their spatial distribution. This did not occur and so the effect of cluster size is unlikely to have been confounded by genetic quality. The only morphological measure correlated with recruitment success was supraorbital comb size. The results are consistent with the prediction that kin tolerance affects recruitment but were at the level of the individual within years, rather than the population among years. Hence an experimental test of the kin favouritism hypothesis for population cycles, by manipulation of relatedness in populations among years, is now required.  相似文献   

15.
We study how correlations in the random fitness assignment may affect the structure of fitness landscapes, in three classes of fitness models. The first is a phenotype space in which individuals are characterized by a large number n of continuously varying traits. In a simple model of random fitness assignment, viable phenotypes are likely to form a giant connected cluster percolating throughout the phenotype space provided the viability probability is larger than 1/2(n). The second model explicitly describes genotype-to-phenotype and phenotype-to-fitness maps, allows for neutrality at both phenotype and fitness levels, and results in a fitness landscape with tunable correlation length. Here, phenotypic neutrality and correlation between fitnesses can reduce the percolation threshold, and correlations at the point of phase transition between local and global are most conducive to the formation of the giant cluster. In the third class of models, particular combinations of alleles or values of phenotypic characters are "incompatible" in the sense that the resulting genotypes or phenotypes have zero fitness. This setting can be viewed as a generalization of the canonical Bateson-Dobzhansky-Muller model of speciation and is related to K-SAT problems, prominent in computer science. We analyze the conditions for the existence of viable genotypes, their number, as well as the structure and the number of connected clusters of viable genotypes. We show that analysis based on expected values can easily lead to wrong conclusions, especially when fitness correlations are strong. We focus on pairwise incompatibilities between diallelic loci, but we also address multiple alleles, complex incompatibilities, and continuous phenotype spaces. In the case of diallelic loci, the number of clusters is stochastically bounded and each cluster contains a very large sub-cube. Finally, we demonstrate that the discrete NK model shares some signature properties of models with high correlations.  相似文献   

16.
Consider the scenario of common gene clusters of closely related species where the cluster sizes could be as large as 400 from an alphabet of 25,000 genes. This paper addresses the problem of computing the statistical significance of such large clusters, whose individual elements occur with very low frequency (of the order of the number of species in this case) and the alphabet set of the elements is relatively large. We present a model where we study the structure of the clusters in terms of smaller nested (or otherwise) sub-clusters contained within the cluster. We give a probability estimation based on the expected cluster structure for such clusters (rather than some form of the product of individual probabilities of the elements). We also give an exact probability computation based on a dynamic programming algorithm, which runs in polynomial time.  相似文献   

17.
Adhesive molecules are suggested to play an important role when a single tissue is separated into two in developmental processes, illustrated by tissue-specific cadherins in the neural tube formation of amphibians. In this paper, we study the possibility for tissue separation to be carried out only by differential cell adhesion and random cell movement without any other morphogenetic mechanisms. We consider a two-dimensional regular triangular lattice filled with cells of three types (black, white, and gray). In the initial state, a cluster of black cells and a cluster of white cells are in contact and are surrounded by gray cells. Nearest-neighbor cells exchange their location at random, but the movement occurs faster if it increases the total adhesion. We considered separation to be successful if, in the final state, black cells and white cells kept their clusters but two clusters lost their direct contact with each other as gray cells are inserted between them. The maximum total adhesion (MTA) rule conjectures that the spatial pattern achieving maximum total adhesion might be that obtained in the final state. In the computer simulation, the runs for successful separation satisfied the condition predicted by the MTA rule. However, the condition for successful separation was more restricted than that predicted by the MTA rule. For some combinations of adhesions, it took an extremely long time to accomplish tissue separation. Finally, we discuss the role of homophilic adhesion molecules (such as cadherins) in the tissue separation processes, and show that the new expression of homophilic adhesion molecules cannot perform tissue separation without the change in other morphogenetic processes.  相似文献   

18.
One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.  相似文献   

19.
MOTIVATION: An important goal of microarray studies is to discover genes that are associated with clinical outcomes, such as disease status and patient survival. While a typical experiment surveys gene expressions on a global scale, there may be only a small number of genes that have significant influence on a clinical outcome. Moreover, expression data have cluster structures and the genes within a cluster have correlated expressions and coordinated functions, but the effects of individual genes in the same cluster may be different. Accordingly, we seek to build statistical models with the following properties. First, the model is sparse in the sense that only a subset of the parameter vector is non-zero. Second, the cluster structures of gene expressions are properly accounted for. RESULTS: For gene expression data without pathway information, we divide genes into clusters using commonly used methods, such as K-means or hierarchical approaches. The optimal number of clusters is determined using the Gap statistic. We propose a clustering threshold gradient descent regularization (CTGDR) method, for simultaneous cluster selection and within cluster gene selection. We apply this method to binary classification and censored survival analysis. Compared to the standard TGDR and other regularization methods, the CTGDR takes into account the cluster structure and carries out feature selection at both the cluster level and within-cluster gene level. We demonstrate the CTGDR on two studies of cancer classification and two studies correlating survival of lymphoma patients with microarray expressions. AVAILABILITY: R code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.
To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters becomes an attractive possibility. This allocation process entails dividing the processes of a job among several clusters, which we refer to as co-allocation. Co-allocation offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processes on any single cluster. In order to realize these possibilities, effective co-allocation, ultimately, depends on the inter-cluster communication cost. In this paper, we introduce a scalable co-allocation strategy called the Maximum Bandwidth Adjacent cluster Set (MBAS) strategy. The strategy makes use of two thresholds to control allocation: one to control the limit on bandwidth on usable inter-cluster communication links and another to control how jobs are split. A simulator that can simulate the dynamic behavior of jobs running across multiple clusters was developed and used to examine the performance of the MBAS co-allocation strategy. Our results indicate that by adjusting the thresholds for link level control and chunk size control in splitting jobs, the MBAS co-allocation strategy can significantly improve both user satisfaction and system utilization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号