首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).  相似文献   

2.

Background  

Determining a suitable sample size is an important step in the planning of microarray experiments. Increasing the number of arrays gives more statistical power, but adds to the total cost of the experiment. Several approaches for sample size determination have been developed for expression array studies, but so far none has been proposed for array comparative genomic hybridization (aCGH).  相似文献   

3.
Summary The bristle pattern of the second-leg basitarsus inDrosophila melanogaster was studied as a function of the number and size of the cells on this segment in well-fed and starved wild-type flies, in triploid flies, and in two mutants (dachs andfour-jointed) that have abnormally short basitarsi. The second-leg basitarsi of well-fed, wild-type flies from 22 otherDrosophila species were studied in a similar manner. There are typically 8 longitudinal rows of evenly-spaced bristles on the second-leg basitarsus, and in each row the number of bristles was consistently found to vary in proportion to the estimated number of cells along the segment, and the interval between bristles was found to vary in proportion to the average cell diameter on the segment. These correlations are interpreted to mean that the spacing of the bristles within each row is controlled developmentally, whereas the number of bristles is not. The interval between bristles is evidently measured either as a fixed number of cells or as a distance which indirectly depends upon cell diameter.  相似文献   

4.
It has been suggested that the mammalian memory system has both familiarity and recollection components. Recently, a high-capacity network to store familiarity has been proposed. Here we derive analytically the optimal learning rule for such a familiarity memory using a signal- to-noise ratio analysis. We find that in the limit of large networks the covariance rule, known to be the optimal local, linear learning rule for pattern association, is also the optimal learning rule for familiarity discrimination. In the limit of large networks, the capacity is independent of the sparseness of the patterns and the corresponding information capacity is 0.057 bits per synapse, which is somewhat less than typically found for associative networks.  相似文献   

5.
A measure of fitness applicable to herring gull (Larus argentatus) life histories is introduced and related to short-term risks which depend on behaviour. This allows calculations of the relationship between fitness and behaviour if one can measure the risks an animal takes behaving in particular ways in specified situations. This method is used to evaluate the fitness of herring gulls during the incubation period at Walney Island, U.K. Two causes of egg mortality are considered, death by predation and death by exposure, and the latter is shown to be less important than the former. The chances of death by predation depend on the number of adults present on the territory and are estimated to be approximately 0·045 eggs per h if no parents are present on territory, 3·5×10?4 eggs per h if one parent is present and 6·7×10?4 eggs per h if two parents are present. Adult death by starvation is assumed to occur when energy reserves are exhausted; however a fitness cost is incurred if an animal maintains excess energy reserves. To restrict the number of behaviour sequences examined to manageable proportions, attention is restricted to the behaviour sequences that result from decision rules in which feeding occurs when energy reserves fall below a set point. Then optimal behaviour is characterized by (1) energy reserves maintained between 500 and 1200 kcal; (2) complementarity of mates' feeding preferences such that at least one of the pair feeds at a refuse tip; (3) parental desertion of offspring if energy reserves fall below 200 kcal; (4) parents spending the minimum possible time together on territory. These results are shown by sensitivity analysis to be insensitive to parameter errors in the model. Treating these characteristics of optimal behaviour as predictions about actual behaviour reveals that predictions 1 and 2 are correct, and prediction 4 is wrong. An experiment was designed to test prediction 3 by estimating the fat reserves of life birds and measuring how long each would continue incubation in the absence of its mate. Fatter birds did continue incubation for longer though the quantitative detail did not match prediction 3 exactly. Thus two aspects of behaviour tally with our calculation as to what is optimal, but one does not. This conclusion is discussed.  相似文献   

6.
The species-area relationship of the island biogeography theory was calculated for macroinvertebrates in 22 coastal, adjacent streams. A z-value of 0.19 was obtained. The low z-value was probably a consequence of the short distances between streams as well as high dispersal rates. In addition, a cluster analysis based on the dissimilarity of species assemblages showed that stream size was of prime importance in categorizing the streams. To a smaller extent water quality affected the community structure in the streams.  相似文献   

7.
Small sample issues for microarray-based classification   总被引:2,自引:0,他引:2  
In order to study the molecular biological differences between normal and diseased tissues, it is desirable to perform classification among diseases and stages of disease using microarray-based gene-expression values. Owing to the limited number of microarrays typically used in these studies, serious issues arise with respect to the design, performance and analysis of classifiers based on microarray data. This paper reviews some fundamental issues facing small-sample classification: classification rules, constrained classifiers, error estimation and feature selection. It discusses both unconstrained and constrained classifier design from sample data, and the contributions to classifier error from constrained optimization and lack of optimality owing to design from sample data. The difficulty with estimating classifier error when confined to small samples is addressed, particularly estimating the error from training data. The impact of small samples on the ability to include more than a few variables as classifier features is explained.  相似文献   

8.
9.
To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences.  相似文献   

10.
Summary The relation of worker size to ommatidia number was examined in the polymorphic antCamponotus pennsylvanicus (DeGeer). Linear regression described this relationship as:Y = 260.9 + 113.6×; whereYis ommatidia number andX is head width. A log-log regression described this relationship as:Y = 323.5 + 286.9*logX(r 2 = 0.98). This analysis indicated an allometric relation of ommatidia number to head width, where ommatidia numbers increase at a slower rate than head width. This relationship is discussed in terms of ethotypes associated with worker morphotypes, and the possible mechanisms regulating polymorphic development.  相似文献   

11.
The size and number of flowers displayed together on an inflorescence (floral display) influences pollinator attraction and pollen transfer and receipt, and is integral to plant reproductive success and fitness. Life history theory predicts that the evolution of floral display is constrained by trade-offs between the size and number of flowers and inflorescences. Indeed, a trade-off between flower size and flower number is a key assumption of models of inflorescence architecture and the evolution of floral display. Surprisingly, however, empirical evidence for the trade-off is limited. In particular, there is a lack of phylogenetic evidence for a trade-off between flower size and number. Analyses of phylogenetic independent contrasts (PICs) of 251 angiosperm species spanning 63 families yielded a significant negative correlation between flower size and flower number. At smaller phylogenetic scales, analyses of individual genera did not always find evidence of a trade-off, a result consistent with previous studies that have examined the trade-off for a single species or genus. Ours is the first study to support an angiosperm-wide trade-off between flower size and number and supports the theory that life history constraints have influenced the evolution of floral display.  相似文献   

12.
Exercise was found to dilate the pupil area while the exercise took place, while the area showed constriction following the exercise period. Exercise-induced change in the size of the pupil was minimal. Pupillary dilation was greatest under conditions of maximal exercise. In exercise under a consistent load, pupillary dilation increased as the exercise time was prolonged. With lower lighting, there was virtually no dilation with exercise.  相似文献   

13.
Clustering methods have been used extensively to unravel cryptic population genetic structure. We investigated the effect of the number of individuals sampled in each location on the resulting number of clusters. Our study was motivated by recent results in Arabidopsis thaliana: studies in which more than one individual was sampled per location apparently have led to a much higher number of clusters than studies where only one individual was sampled in each location, as is generally done in this species. We show, using computer simulations and microsatellite data in A. thaliana, that the number of sampled individuals indeed has a strong impact on the number of resulting clusters. This effect is smaller if the sampled populations have a hierarchical structure. In most cases, sampling 5–10 individuals per population should be enough. The results argue for abandoning the concept of ‘accessions’ in partially selfing organisms.  相似文献   

14.
15.
E J Feuer  L G Kessler 《Biometrics》1989,45(2):629-636
McNemar's (1947, Psychometrika 12, 153-157) test of marginal homogeneity is generalized to a two-sample situation where the hypothesis of interest is that the marginal changes in each of two independently sampled tables are equal. This situation is especially applicable to two cohorts (a control and an intervention cohort), each measured at baseline and after the intervention on a binary outcome variable. Some assumptions often realistic in this situation simplify the calculation of sample size. The calculation of sample size in a study designed to increase utilization of breast cancer screening is demonstrated.  相似文献   

16.
17.
Existing optimality models of propagule size and number are not appropriate for many organisms. First, existing models assume a monotonically increasing offspring fitness/propagule size relationship. However, offspring survival during certain stages may decrease with increasing propagule size, generating a peaked offspring fitness/propagule size function (e.g., egg size in oxygen-limited aquatic environments). Second, existing models typically do not consider maternal effects on total reproductive output and the expression of offspring survival/propagule size relationships. However, larger females often have greater total egg production and may provide better habitats for their offspring. We develop a specific optimality model that incorporates these effects and test its predictions using data from salmonid fishes. We then outline a general model without assuming specific functional forms and test its predictions using data from freshwater fishes. Our theoretical and empirical results illustrate that, when offspring survival is negatively correlated with propagule size, optimal propagule size is larger in better habitats. When larger females provide better habitats, their optimal propagule size is larger. Nevertheless, propagule number should increase more rapidly than propagule size for a given increase in maternal size. In the absence of density dependence, females with greater relative reproductive output (i.e., for a given body size) should produce more but not larger propagules.  相似文献   

18.
R L Moldow  R S Yalow 《Life sciences》1978,22(20):1859-1864
Determination by radioimmunoassay of thyrotropin in the brains of rats and humans reveals that the dimensions within which the hormone is found is the same for rodents and primates but that the anatomical regions in which the hormone is found depends on brain size. Thyrotropin is widely distributed in the brain of rats but is found only in the hypothalamic regions of the human brain. These findings suggest that the pituitary is the sole site of synthesis of this hormone.  相似文献   

19.

Background  

Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT).  相似文献   

20.
Summary Warm-up rates and cooling constants were measured in several groups of insects over a wide range of thoracic weights. Vertebrate heterotherms show an inverse dependence of warm-up rate on body weight, but in insects warm-up rate increases with increasing size over the range studied (Figures 1-4, 8). Equations are derived, based on known or estimated relations of heat loss and production to body weight, that predict warm-up rates in insects and mammals with reasonable accuracy. Both weight-specific heat production and loss increase with decreasing body size, but heat loss increases more rapidly. At the size range of insects, loss is so rapid that metabolism cannot fully compensate. Then warm-up rate is constant or decreases with diminishing size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号