共查询到20条相似文献,搜索用时 15 毫秒
1.
A prediction-based resampling method for estimating the number of clusters in a dataset 总被引:10,自引:0,他引:10 下载免费PDF全文
Background
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. 相似文献2.
Background
Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. 相似文献3.
B Essex 《Proceedings of the Royal Society of London. Series B, Containing papers of a Biological character. Royal Society (Great Britain)》1980,209(1174):89-96
The village health worker has two basic tasks: (1) to prevent health problems; (2) to identify and provide effective management of illness in the village. The village health worker has limited education and the length of basic health training is usually 12 weeks. This training can only be considered appropriate if it enables the village health worker to practise effectively within the cultural, social, economic and educational constraints of the village. How far does the training help this worker to work with other members of the village community to prevent illness? These others include mothers, children, school teachers, village leaders, religious leaders, traditional birth attendants, and traditional healers; training needs to be problem-oriented. The management decisions that have to be made in situations of shortage of resources are complex to analyse. A W.H.O. research project has been undertaken to determine the feasibility of developing and using flow charts to provide alternative and more appropriate methods to help the village health worker to provide optimal management in suboptimal situations. Some examples of these new methods are presented. 相似文献
4.
Models for longitudinal data: a generalized estimating equation approach 总被引:84,自引:0,他引:84
This article discusses extensions of generalized linear models for the analysis of longitudinal data. Two approaches are considered: subject-specific (SS) models in which heterogeneity in regression parameters is explicitly modelled; and population-averaged (PA) models in which the aggregate response for the population is the focus. We use a generalized estimating equation approach to fit both classes of models for discrete and continuous outcomes. When the subject-specific parameters are assumed to follow a Gaussian distribution, simple relationships between the PA and SS parameters are available. The methods are illustrated with an analysis of data on mother's smoking and children's respiratory disease. 相似文献
5.
In clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice's definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example. 相似文献
6.
When we plan for long-range goals, proximal information cannot be exploited in a blindly myopic way, as relevant future information must also be considered. But when a subgoal must be resolved first, irrelevant future information should not interfere with the processing of more proximal, subgoal-relevant information. We explore the idea that decision making in both situations relies on the flexible modulation of the degree to which different pieces of information under consideration are weighted, rather than explicitly decomposing a problem into smaller parts and solving each part independently. We asked participants to find the shortest goal-reaching paths in mazes and modeled their initial path choices as a noisy, weighted information integration process. In a base task where choosing the optimal initial path required weighting starting-point and goal-proximal factors equally, participants did take both constraints into account, with participants who made more accurate choices tending to exhibit more balanced weighting. The base task was then embedded as an initial subtask in a larger maze, where the same two factors constrained the optimal path to a subgoal, and the final goal position was irrelevant to the initial path choice. In this more complex task, participants’ choices reflected predominant consideration of the subgoal-relevant constraints, but also some influence of the initially-irrelevant final goal. More accurate participants placed much less weight on the optimality-irrelevant goal and again tended to weight the two initially-relevant constraints more equally. These findings suggest that humans may rely on a graded, task-sensitive weighting of multiple constraints to generate approximately optimal decision outcomes in both hierarchical and non-hierarchical goal-directed tasks. 相似文献
7.
Elad Shtilerman Colin J. Thompson Lewi Stone Michael Bode Mark Burgman 《Proceedings. Biological sciences / The Royal Society》2014,281(1779)
Ecologists are often required to estimate the number of species in a region or designated area. A number of diversity indices are available for this purpose and are based on sampling the area using quadrats or other means, and estimating the total number of species from these samples. In this paper, a novel theory and method for estimating the number of species is developed. The theory involves the use of the Laplace method for approximating asymptotic integrals. The method is shown to be successful by testing random simulated datasets. In addition, several real survey datasets are tested, including forests that contain a large number (tens to hundreds) of tree species, and an aquatic system with a large number of fish species. The method is shown to give accurate results, and in almost all cases found to be superior to existing tools for estimating diversity. 相似文献
8.
9.
Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately Kn/K! possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method. 相似文献
10.
11.
Background
The wide scale permeation of health care by the shared decision making concept (SDM) reflects its relevance and advanced stage of development. An increasing number of studies evaluating the efficacy of SDM use instruments based on various sub-constructs administered from different viewpoints. However, as the concept has never been captured in operable core definition it is quite difficult to link these parts of evidence.This study aims at investigating interrelations of SDM indicators administered from different perspectives.Method
A comprehensive inventory was developed mapping judgements from different perspectives (observer, doctor, patient) and constructs (behavior, perception) referring to three units (doctor, patient, doctor-patient-dyad) and an identical set of SDM-indicators. The inventory adopted the existing approaches, but added additional observer foci (patient and doctor-patient-dyad) and relevant indicators hitherto neglected by existing instruments. The complete inventory comprising a doctor-patient-questionnaire and an observer-instrument was applied to 40 decision consultations from 10 physicians from different medical fields. Convergent validities were calculated on the basis of Pearson correlation coefficients.Results
Reliabilities for all scales were high to excellent. No correlations were found between observer and patients or physicians neither for means nor for single items. Judgements of doctors and patients were moderately related. Correlations between the observer scales and within the subjective perspectives were high. Inter-perspective agreement was not related to SDM performance or patient activity.Conclusion
The study demonstrates the contribution to involvement made by each of the relevant perspectives and emphasizes the need for an inter-subjective approach regarding SDM measurement. 相似文献12.
13.
A likelihood method is introduced that jointly estimates the number of loci and the additive effect of alleles that account for the genetic variance of a normally distributed quantitative character in a randomly mating population. The method assumes that measurements of the character are available from one or both parents and an arbitrary number of full siblings. The method uses the fact, first recognized by Karl Pearson in 1904, that the variance of a character among offspring depends on both the parental phenotypes and on the number of loci. Simulations show that the method performs well provided that data from a sufficient number of families (on the order of thousands) are available. This method assumes that the loci are in Hardy–Weinberg and linkage equilibrium but does not assume anything about the linkage relationships. It performs equally well if all loci are on the same non-recombining chromosome provided they are in linkage equilibrium. The method can be adapted to take account of loci already identified as being associated with the character of interest. In that case, the method estimates the number of loci not already known to affect the character. The method applied to measurements of crown–rump length in 281 family trios in a captive colony of African green monkeys (Chlorocebus aethiopus sabaeus) estimates the number of loci to be 112 and the additive effect to be 0.26 cm. A parametric bootstrap analysis shows that a rough confidence interval has a lower bound of 14 loci. 相似文献
14.
15.
Sahely Bhadra Chiranjib Bhattacharyya Nagasuma R Chandra I Saira Mian 《Algorithms for molecular biology : AMB》2009,4(1):1-15
Background
One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values.Results
We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model.Conclusion
The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry. 相似文献16.
Cullen-McEwen LA Armitage JA Nyengaard JR Moritz KM Bertram JF 《American journal of physiology. Renal physiology》2011,300(6):F1448-F1453
Low glomerular (nephron) endowment has been associated with an increased risk of cardiovascular and renal disease in adulthood. Nephron endowment in humans is determined by 36 wk of gestation, while in rats and mice nephrogenesis ends several days after birth. Specific genes and environmental perturbations have been shown to regulate nephron endowment. Until now, design-based method for estimating nephron number in developing kidneys was unavailable. This was due in part to the difficulty associated with unambiguously identifying developing glomeruli in histological sections. Here, we describe a method that uses lectin histochemistry to identify developing glomeruli and the physical disector/fractionator principle to provide unbiased estimates of total glomerular number (N(glom)). We have characterized N(glom) throughout development in kidneys from 76 rats and model this development with a 5-parameter logistic equation to predict N(glom) from embryonic day 17.25 to adulthood (r(2) = 0.98). This approach represents the first design-based method with which to estimate N(glom) in the developing kidney. 相似文献
17.
18.
Neuroeconomics is the study of the neurobiological and computational basis of value-based decision making. Its goal is to provide a biologically based account of human behaviour that can be applied in both the natural and the social sciences. This Review proposes a framework to investigate different aspects of the neurobiology of decision making. The framework allows us to bring together recent findings in the field, highlight some of the most important outstanding problems, define a common lexicon that bridges the different disciplines that inform neuroeconomics, and point the way to future applications. 相似文献
19.
20.