首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The present study was carried out in order to obtain a numerical classifier for the assessment of the malignancy in astrocytomas including glioblastomas ('astrocytomas grade 4'). The attempt resulted in 'TESTAST 268', a classifier based on a reference sample of 268 tumours, 67 in each of four malignancy classes. TESTAST 268 aids the identification of astrocytomas with one of four malignancy classes by means of eight classification variables, five histologic and three non-histologic. Identification is achieved with the aid of linear discriminant functions, both according to Bayes' decision rule (BAYTEST) and by canonical discriminant analysis (CANTEST) using the squared Mahalanobis distance. The discriminant functions with the calibration of the reference sample of the 268 tumours may be implemented on personal and even small pocket computers for practical application.  相似文献   

2.
For practical construction of complex synthetic genetic networks able to perform elaborate functions it is important to have a pool of relatively simple modules with different functionality which can be compounded together. To complement engineering of very different existing synthetic genetic devices such as switches, oscillators or logical gates, we propose and develop here a design of synthetic multi-input classifier based on a recently introduced distributed classifier concept. A heterogeneous population of cells acts as a single classifier, whose output is obtained by summarizing the outputs of individual cells. The learning ability is achieved by pruning the population, instead of tuning parameters of an individual cell. The present paper is focused on evaluating two possible schemes of multi-input gene classifier circuits. We demonstrate their suitability for implementing a multi-input distributed classifier capable of separating data which are inseparable for single-input classifiers, and characterize performance of the classifiers by analytical and numerical results. The simpler scheme implements a linear classifier in a single cell and is targeted at separable classification problems with simple class borders. A hard learning strategy is used to train a distributed classifier by removing from the population any cell answering incorrectly to at least one training example. The other scheme implements a circuit with a bell-shaped response in a single cell to allow potentially arbitrary shape of the classification border in the input space of a distributed classifier. Inseparable classification problems are addressed using soft learning strategy, characterized by probabilistic decision to keep or discard a cell at each training iteration. We expect that our classifier design contributes to the development of robust and predictable synthetic biosensors, which have the potential to affect applications in a lot of fields, including that of medicine and industry.  相似文献   

3.
A computational method is presented for minimizing the weighted sum of squares of the differences between observed and expected pairwise distances between species, where the expectations are generated by an additive tree model. The criteria of Fitch and Margoliash (1967, Science 155:279-284) and Cavalli-Sforza and Edwards (1967, Evolution 21:550-570) are both weighted least squares, with different weights. The method presented iterates lengths of adjacent branches in the tree three at a time. The weighted sum of squares never increases during the process of iteration, and the iterates approach a stationary point on the surface of the sum of squares. This iterative approach makes it particularly easy to maintain the constraint that branch lengths never become negative, although negative branch lengths can also be allowed. The method is implemented in a computer program, FITCH, which has been distributed since 1982 as part of the PHYLIP package of programs for inferring phylogenies, and is also implemented in PAUP*. The present method is compared, using some simulated data sets, with an implementation of the method of De Soete (1983, Psychometrika 48:621-626); it is slower than De Soete's method but more effective at finding the least squares tree. The relationship of this method to the neighbor-joining method is also discussed.  相似文献   

4.
Huang J  Ma S  Xie H 《Biometrics》2006,62(3):813-820
We consider two regularization approaches, the LASSO and the threshold-gradient-directed regularization, for estimation and variable selection in the accelerated failure time model with multiple covariates based on Stute's weighted least squares method. The Stute estimator uses Kaplan-Meier weights to account for censoring in the least squares criterion. The weighted least squares objective function makes the adaptation of this approach to multiple covariate settings computationally feasible. We use V-fold cross-validation and a modified Akaike's Information Criterion for tuning parameter selection, and a bootstrap approach for variance estimation. The proposed method is evaluated using simulations and demonstrated on a real data example.  相似文献   

5.
A general Akaike-type criterion for model selection in robust regression   总被引:2,自引:0,他引:2  
BURMAN  P.; NOLAN  D. 《Biometrika》1995,82(4):877-886
Akaike's procedure (1970) for selecting a model minimises anestimate of the expected squared error in predicting new, independentobservations. This selection criterion was designed for modelsfitted by least squares. A different model-fitting technique,such as least absolute deviation regression, requires an appropriatemodel selection procedure. This paper presents a general Akaike-typecriterion applicable to a wide variety of loss functions formodel fitting. It requires only that the function be convexwith a unique minimum, and twice differentiable in expectation.Simulations show that the estimators proposed here well approximatetheir respective prediction errors.  相似文献   

6.
7.
We have investigated how observers learn to classify compound Gabor signals as a function of their differentiating frequency components. Performance appears to be consistent with decision processes based upon the least squares minimum distance classifier (LSMDC) operating over a cartesian feature space consisting of the real (even) and imaginary (odd) components of the signals. The LSMDC model assumes observers form prototype signals, or adaptive filters, for each signal class in the learing phase, and classify as a function of their degree of match to each prototype. The underlying matching process can be modelled in terms of cross-correlation between prototype images and the input sample.Study supported by Deutsche Forschungsgemeinschaft grants Re 337/3-3 and Po 121/13-1Deutsche Forschungsgemeinschaft Guest-Professor (Mu 93/103-1)  相似文献   

8.
Single-nucleotide polymorphisms (SNPs), believed to determine human differences, are widely used to predict risk of diseases. Typically, clinical samples are limited and/or the sampling cost is high. Thus, it is essential to determine an adequate sample size needed to build a classifier based on SNPs. Such a classifier would facilitate correct classifications, while keeping the sample size to a minimum, thereby making the studies cost-effective. For coded SNP data from 2 classes, an optimal classifier and an approximation to its probability of correct classification (PCC) are derived. A linear classifier is constructed and an approximation to its PCC is also derived. These approximations are validated through a variety of Monte Carlo simulations. A sample size determination algorithm based on the criterion, which ensures that the difference between the 2 approximate PCCs is below a threshold, is given and its effectiveness is illustrated via simulations. For the HapMap data on Chinese and Japanese populations, a linear classifier is built using 51 independent SNPs, and the required total sample sizes are determined using our algorithm, as the threshold varies. For example, when the threshold value is 0.05, our algorithm determines a total sample size of 166 (83 for Chinese and 83 for Japanese) that satisfies the criterion.  相似文献   

9.
The appropriate operation of a radial basis function (RBF) neural network depends mainly upon an adequate choice of the parameters of its basis functions. The simplest approach to train an RBF network is to assume fixed radial basis functions defining the activation of the hidden units. Once the RBF parameters are fixed, the optimal set of output weights can be determined straightforwardly by using a linear least squares algorithm, which generally means reduction in the learning time as compared to the determination of all RBF network parameters using supervised learning. The main drawback of this strategy is the requirement of an efficient algorithm to determine the number, position, and dispersion of the RBFs. The approach proposed here is inspired by models derived from the vertebrate immune system, that will be shown to perform unsupervised cluster analysis. The algorithm is introduced and its performance is compared to that of the random, k-means center selection procedures and other results from the literature. By automatically defining the number of RBF centers, their positions and dispersions, the proposed method leads to parsimonious solutions. Simulation results are reported concerning regression and classification problems.  相似文献   

10.
A new manifold learning method, called parameter-free semi-supervised local Fisher discriminant analysis (pSELF), is proposed to map the gene expression data into a low-dimensional space for tumor classification. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, which can be computed efficiently by eigen decomposition. Experimental results on synthetic data and SRBCT, DLBCL, and Brain Tumor gene expression data sets demonstrate the effectiveness of the proposed method.  相似文献   

11.

Background

The functions of proteins are closely related to their subcellular locations. In the post-genomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means.

Results

This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by a cascaded fusion of cleavage site prediction and profile alignment. Specifically, the informative segments of protein sequences are identified by a cleavage site predictor using the information in their N-terminal shorting signals. Then, the sequences are truncated at the cleavage site positions, and the shortened sequences are passed to PSI-BLAST for computing their profiles. Subcellular localization are subsequently predicted by a profile-to-profile alignment support-vector-machine (SVM) classifier. To further reduce the training and recognition time of the classifier, the SVM classifier is replaced by a new kernel method based on the perturbational discriminant analysis (PDA).

Conclusions

Experimental results on a new dataset based on Swiss-Prot Release 57.5 show that the method can make use of the best property of signal- and homology-based approaches and can attain an accuracy comparable to that achieved by using full-length sequences. Analysis of profile-alignment score matrices suggest that both profile creation time and profile alignment time can be reduced without significant reduction in subcellular localization accuracy. It was found that PDA enjoys a short training time as compared to the conventional SVM. We advocate that the method will be important for biologists to conduct large-scale protein annotation or for bioinformaticians to perform preliminary investigations on new algorithms that involve pairwise alignments.
  相似文献   

12.

Background

We present a simple method to train a potential function for the protein folding problem which, even though trained using a small number of proteins, is able to place a significantly large number of native conformations near a local minimum. The training relies on generating decoys by energy minimization of the native conformations using the current potential and using a physically meaningful objective function (derivative of energy with respect to torsion angles at the native conformation) during the quadratic programming to place the native conformation near a local minimum.

Results

We also compare the performance of three different types of energy functions and find that while the pairwise energy function is trainable, a solvation energy function by itself is untrainable if decoys are generated by minimizing the current potential starting at the native conformation. The best results are obtained when a pairwise interaction energy function is used with solvation energy function.

Conclusions

We are able to train a potential function using six proteins which places a total of 42 native conformations within ~4 Å rmsd and 71 native conformations within ~6 Å rmsd of a local minimum out of a total of 91 proteins. Furthermore, the threading test using the same 91 proteins ranks 89 native conformations to be first and the other two as second.  相似文献   

13.
The assessment of data analysis methods in 1H NMR based metabolic profiling is hampered owing to a lack of knowledge of the exact sample composition. In this study, an artificial complex mixture design comprising two artificially defined groups designated normal and disease, each containing 30 samples, was implemented using 21 metabolites at concentrations typically found in human urine and having a realistic distribution of inter-metabolite correlations. These artificial mixtures were profiled by 1H NMR spectroscopy and used to assess data analytical methods in the task of differentiating the two conditions. When metabolites were individually quantified, volcano plots provided an excellent method to track the effect size and significance of the change between conditions. Interestingly, the Welch t test detected a similar set of metabolites changing between classes in both quantified and spectral data, suggesting that differential analysis of 1H NMR spectra using a false discovery rate correction, taking into account fold changes, is a reliable approach to detect differential metabolites in complex mixture studies. Various multivariate regression methods based on partial least squares (PLS) were applied in discriminant analysis mode. The most reliable methods in quantified and spectral 1H NMR data were PLS and RPLS linear and logistic regression respectively. A jackknife based strategy for variable selection was assessed on both quantified and spectral data and results indicate that it may be possible to improve on the conventional Orthogonal-PLS methodology in terms of accuracy and sensitivity. A key improvement of our approach consists of objective criteria to select significant signals associated with a condition that provides a confidence level on the discoveries made, which can be implemented in metabolic profiling studies.  相似文献   

14.
We present an open-source software able to automatically mutate any residue positions and find the best aminoacids in an arbitrary protein structure without requiring pairwise approximations. Our software, PROTDES, is based on CHARMM and it searches automatically for mutations optimizing a protein folding free energy. PROTDES allows the integration of molecular dynamics within the protein design. We have implemented an heuristic optimization algorithm that iteratively searches the best aminoacids and their conformations for an arbitrary set of positions within a structure. Our software allows CHARMM users to perform protein design calculations and to create their own procedures for protein design using their own energy functions. We show this by implementing three different energy functions based on different solvent treatments: surface area accessibility, generalized Born using molecular volume and an effective energy function. PROTDES, a tutorial, parameter sets, configuration tools and examples are freely available at http://soft.synth-bio.org/protdes.html.  相似文献   

15.
In the study of in silico functional genomics, improving the performance of protein function prediction is the ultimate goal for identifying proteins associated with defined cellular functions. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have successfully selected biologically significant features for protein function prediction. This was performed using a new feature selection method (FrankSum) that avoids data distribution assumptions, uses a data independent measurement (p-value) within the feature, identifies redundancy between features and uses an appropriate ranking criterion for feature selection. We have shown that classifiers generated from features selected by FrankSum outperforms classifiers generated from full feature sets, randomly selected features and features selected from the Wrapper method. We have also shown the features are concordant across all species and top ranking features are biologically informative. We conclude that feature selection is vital for successful protein function prediction and FrankSum is one of the feature selection methods that can be applied successfully to such a domain.  相似文献   

16.
We first discuss quantitative rules for determining the protein structural classes based on their secondary structures. Then we propose a modification of the least Mahalanobis distance method for prediction of protein classes. It is a generalization of a quadratic discriminant function to the case of degenerate covariance matrices. The resubstitution tests and leave-one-out tests are carried out to compare several methods. When the class sample sizes or the covariance matrices of different classes are significantly different, the modified method should be used to replace the least Mahalanobis distance method. Two lemmas for the derivation of our new algorithm are proved in an appendix.  相似文献   

17.
The weights used in iterative weighted least squares (IWLS) regression are usually estimated parametrically using a working model for the error variance. When the variance function is misspecified, the IWLS estimates of the regression coefficients β are still asymptotically consistent but there is some loss in efficiency. Since second moments can be quite hard to model, it makes sense to estimate the error variances nonparametrically and to employ weights inversely proportional to the estimated variances in computing the WLS estimate for β. Surprisingly, this approach had not received much attention in the literature. The aim of this note is to demonstrate that such a procedure can be implemented easily in S-plus using standard functions with default options making it suitable for routine applications. The particular smoothing method that we use is local polynomial regression applied to the logarithm of the squared residuals but other smoothers can be tried as well. The proposed procedure is applied to data on the use of two different assay methods for a hormone. Efficiency calculations based on the estimated model show that the nonparametric IWLS estimates are more efficient than the parametric IWLS estimates based on three different plausible working models for the variance function. The proposed estimators also perform well in a simulation study using both parametric and nonparametric variance functions as well as normal and gamma errors.  相似文献   

18.
Li Y  Wang N  Perkins EJ  Zhang C  Gong P 《PloS one》2010,5(10):e13715
Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.  相似文献   

19.
Chemotaxonomic data for strains of Actinobacillus, Haemophilus and Pasteurella spp. were analysed using three multivariate statistical strategies: principal components, partial least squares discriminant, and soft independent modelling of class analogy. The species comprised Actinobacillus actinomycetemcomitans. Haemophilus aphrophilus, H. paraphrophilus, H. influenzae, Pasteurella multocida, P. haemolytica and P. ureae. Strains were characterized by cell sugar and fatty acid composition, lysis kinetics during EDTA and EDTA plus lysozyme treatment, and methylene blue reduction. In total 23 quantitative variables were compiled from chemotaxonomic analyses of 25 strains. A. actinomycetemcomitans and H. aphrophilus formed distinct classes which differed from those of H. paraphrophilus, H. influenzae and Pasteurella spp. All characterization variables, except those describing fatty acid content, contributed significantly to inter-species discrimination.  相似文献   

20.
A method to determine the mechanical time-constant distribution of the lung during a forced expiration manoeuvre is proposed. The method is based on a least squares algorithm constrained to give reasonably smooth non-negative solutions. The smoothing constraint was imposed by minimizing the second derivative of the distribution function in accordance with the physiological meaning of the time-constant distribution. Nevertheless, the obtained solution depends greatly on the relative weights of the two terms in the objective function to be minimized i.e., the error on the fit of the volume signal and the smoothness of the distribution function. To select the optimum smoothing weight, a criterion based on the stability of the reconstructed distribution shape was defined. The performance of the algorithm and that of the defined criterion were evaluated by using simulated signals of forced expired volume. The error of reconstructed distributions was quantified by means of the area enclosed between this distribution and the original one used to generate the simulated volume signal. The results obtained showed that for all the analyzed signals: (1) There is a value of the weight of the smoothing constraint which gives rise to a solution that is optimum in a least squares sense. (2) The proposed stabilization criterion enables us to approach this optimum solution from experimental signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号