共查询到20条相似文献,搜索用时 0 毫秒
1.
Mathematical models are an essential tool in systems biology, linking the behaviour of a system to the interactions between its components. Parameters in empirical mathematical models must be determined using experimental data, a process called regression. Because experimental data are noisy and incomplete, diagnostics that test the structural identifiability and validity of models and the significance and determinability of their parameters are needed to ensure that the proposed models are supported by the available data. 相似文献
2.
MOTIVATION: Using gene expression data to classify (or predict) tumor types has received much research attention recently. Due to some special features of gene expression data, several new methods have been proposed, including the weighted voting scheme of Golub et al., the compound covariate method of Hedenfalk et al. (originally proposed by Tukey), and the shrunken centroids method of Tibshirani et al. These methods look different and are more or less ad hoc. RESULTS: We point out a close connection of the three methods with a linear regression model. Casting the classification problem in the general framework of linear regression naturally leads to new alternatives, such as partial least squares (PLS) methods and penalized PLS (PPLS) methods. Using two real data sets, we show the competitive performance of our new methods when compared with the other three methods. 相似文献
3.
A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations. 相似文献
4.
Pang H Lin A Holford M Enerson BE Lu B Lawton MP Floyd E Zhao H 《Bioinformatics (Oxford, England)》2006,22(16):2028-2036
MOTIVATION: Although numerous methods have been developed to better capture biological information from microarray data, commonly used single gene-based methods neglect interactions among genes and leave room for other novel approaches. For example, most classification and regression methods for microarray data are based on the whole set of genes and have not made use of pathway information. Pathway-based analysis in microarray studies may lead to more informative and relevant knowledge for biological researchers. RESULTS: In this paper, we describe a pathway-based classification and regression method using Random Forests to analyze gene expression data. The proposed methods allow researchers to rank important pathways from externally available databases, discover important genes, find pathway-based outlying cases and make full use of a continuous outcome variable in the regression setting. We also compared Random Forests with other machine learning methods using several datasets and found that Random Forests classification error rates were either the lowest or the second-lowest. By combining pathway information and novel statistical methods, this procedure represents a promising computational strategy in dissecting pathways and can provide biological insight into the study of microarray data. AVAILABILITY: Source code written in R is available from http://bioinformatics.med.yale.edu/pathway-analysis/rf.htm. 相似文献
5.
Holladay LA 《Biophysical chemistry》1979,10(2):183-185
Data obtained from early times during the transient period of sedimentation equilibrium experiments are analyzed using an approximate solution to the Lamm equation to estimate s/D. The Cr versus r data obtained at several times during approach-to-equilibrium are analyzed using a nonlinear least squares algorithm and Fujita's approximate solution. This procedure was tested using D-Ser13-somatostatin, ribonuclease, and ovalbumin. The results obtained demonstrate that for monodisperse samples s/D may be rapidly and reliably estimated using this method. 相似文献
6.
7.
A program package is described in which vegetation data can be objectively classified and analysed. Classification is based on minimum entropy. Results show that in a comparison with TWINSPAN, improvements to the relevé sequence, in terms of community variation, can be obtained. Furthermore, TWINSPAN classifications are shown to be dependent on a particular relevé input sequence. 相似文献
8.
Several different methods of analysis are applied to data consisting of weight measurements, taken at specified post-treatment times, of harvested thyroids from rats given one of four treatments. Previous studies of this type of data indicated that the growth is initially rapid, and that a second phase of less rapid growth is followed by a final phase in which little additional growth occurs. The data are further characterized by increasing variance through time. The primary purpose of the analysis is to study the effect of the treatments at the end of the study period. One-way analysis of variance tests among groups are performed on each day, but the results are not particularly helpful. However, results from two-way analyses of variance (over subsets of days and groups) are consistent with the three phase model and accordingly indicate significant group differences during each. Finally, maximum likelihood methods are used to fit a three part segmented linear regression model. 相似文献
9.
Fernie AR Trethewey RN Krotzky AJ Willmitzer L 《Nature reviews. Molecular cell biology》2004,5(9):763-769
The concept of metabolite profiling has been around for several decades, but only recent technical innovations have allowed metabolite profiling to be carried out on a large scale - with respect to both the number of metabolites measured and the number of experiments carried out. As a result, the power of metabolite profiling as a technology platform for diagnostics, and the research areas of gene-function analysis and systems biology, is now beginning to be fully realized. 相似文献
10.
Shen L Tan EC 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(2):166-175
The use of penalized logistic regression for cancer classification using microarray expression data is presented. Two dimension reduction methods are respectively combined with the penalized logistic regression so that both the classification accuracy and computational speed are enhanced. Two other machine-learning methods, support vector machines and least-squares regression, have been chosen for comparison. It is shown that our methods have achieved at least equal or better results. They also have the advantage that the output probability can be explicitly given and the regression coefficients are easier to interpret. Several other aspects, such as the selection of penalty parameters and components, pertinent to the application of our methods for cancer classification are also discussed. 相似文献
11.
Local polynomial regression analysis of clustered data 总被引:1,自引:0,他引:1
12.
MOTIVATION: Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often cover well only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert. RESULTS: Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques. AVAILABILITY: VizRank and radviz are implemented as part of the Orange data mining suite (http://www.ailab.si/orange). SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.ailab.si/supp/bi-cancer. 相似文献
13.
Logistic regression analysis of sample survey data 总被引:3,自引:0,他引:3
14.
15.
Distribution-free regression analysis of grouped survival data 总被引:1,自引:0,他引:1
Methods based on regression models for logarithmic hazard functions, Cox models, are given for analysis of grouped and censored survival data. By making an approximation it is possible to obtain explicitly a maximum likelihood function involving only the regression parameters. This likelihood function is a convenient analog to Cox's partial likelihood for ungrouped data. The method is applied to data from a toxicological experiment. 相似文献
16.
17.
《Ostrich》2013,84(3):265-268
During the analysis of moult records from the SAFRING database it was found that for some datasets the records were not evenly distributed temporally and the proportion of moulting to non-moulting birds was not what would be expected from random sampling. In an attempt to balance these data, the records of non-moulting birds were subsampled with different sample sizes prior to moult regression analysis, and the resulting moult estimates were then compared. The results suggest that subsampling non-moulting birds such that they occur in the expected proportion to actively moulting birds, based on the duration of moult, provides the best estimates of moult. 相似文献
18.
Jones L Holt CA Beynon MJ 《Computer methods in biomechanics and biomedical engineering》2008,11(1):31-40
There are certain major obstacles to using motion analysis as an aid to clinical decision making. These include: the difficulty in comprehending large amounts of both corroborating and conflicting information; the subjectivity of data interpretation; the need for visualization; and the quantitative comparison of temporal waveform data. This paper seeks to overcome these obstacles by applying a hybrid approach to the analysis of motion analysis data using principal component analysis (PCA), the Dempster-Shafer (DS) theory of evidence and simplex plots. Specifically, the approach is used to characterise the differences between osteoarthritic (OA) and normal (NL) knee function data and to produce a hierarchy of those variables that are most discriminatory in the classification process. Comparisons of the results obtained with the hybrid approach are made with results from artificial neural network analyses. 相似文献
19.