首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Helgason CM  Jobe TH 《PloS one》2008,3(4):e1909
BACKGROUND: It has been shown that the clinical state of one patient can be represented by known measured variables of interest, each of which then form the element of a fuzzy set as point in the unit hypercube. We hypothesized that precise comparison of a single patient with the average patient of a large double blind controlled randomized study is possible using fuzzy theory. METHODS/PRINCIPLE FINDINGS: The sets as points unit hypercube geometry allows fuzzy subsethood to define in measures of fuzzy cardinality different conditions, similarity and comparison between fuzzy sets. A fuzzy measure of prediction is defined from fuzzy measures of similarity and comparison. It is a measure of the degree to which fuzzy set A is similar to fuzzy set B when different conditions are taken into account and removed from the comparison. When represented as a fuzzy set as point in the unit hypercube, a clinical patient can be compared to an average patient of a large group study in a precise manner. This comparison is expressed by the fuzzy prediction measure. This measure in itself is not a probability. Once thus precisely matched to the average patient of a large group study, risk reduction is calculated by multiplying the measured similarity of the clinical patient to the risk of the average trial patient. CONCLUSION/SIGNIFICANCE: Otherwise not precisely translatable to the single case, the result of group statistics can be applied to the single case through the use of fuzzy subsethood and measured in fuzzy cardinality. This measure is an alternative to a Bayesian or other probability based statistical approach.  相似文献   

2.
MOTIVATION: In our previous approach, we proposed a hybrid method for protein secondary structure prediction called HYPROSP, which combined our proposed knowledge-based prediction algorithm PROSP and PSIPRED. The knowledge base constructed for PROSP contains small peptides together with their secondary structural information. The hybrid strategy of HYPROSP uses a global quantitative measure, match rate, to determine whether PROSP or PSIPRED is to be used for the prediction of a target protein. HYPROSP made slight improvement of Q(3) over PSIPRED because PROSP predicted well for proteins with match rate >80%. As the portion of proteins with match rate >80% is quite small and as the performance of PSIPRED also improves, the advantage of HYPROSP is diluted. To overcome this limitation and further improve the hybrid prediction method, we present in this paper a new hybrid strategy HYPROSP II that is based on a new quantitative measure called local match rate. RESULTS: Local match rate indicates the amount of structural information that each amino acid can extract from the knowledge base. With the local match rate, we are able to define a confidence level of the PROSP prediction results for each amino acid. Our new hybrid approach, HYPROSP II, is proposed as follows: for each amino acid in a target protein, we combine the prediction results of PROSP and PSIPRED using a hybrid function defined on their respective confidence levels. Two datasets in nrDSSP and EVA are used to perform a 10-fold cross validation. The average Q(3) of HYPROSP II is 81.8% and 80.7% on nrDSSP and EVA datasets, respectively, which is 2.0% and 1.1% better than that of PSIPRED. For local structures with match rate >80%, the average Q(3) improvement is 4.4% on the nrDSSP dataset. The use of local match rate improves the accuracy better than global match rate. There has been a long history of attempts to improve secondary structure prediction. We believe that HYPROSP II has greatly utilized the power of peptide knowledge base and raised the prediction accuracy to a new high. The method we developed in this paper could have a profound effect on the general use of knowledge base techniques for various predictionalgorithms. AVAILABILITY: The Linux executable file of HYPROSP II, as well as both nrDSSP and EVA datasets can be downloaded from http://bioinformatics.iis.sinica.edu.tw/HYPROSPII/.  相似文献   

3.
Local structure prediction can facilitate ab initio structure prediction, protein threading, and remote homology detection. However, the accuracy of existing methods is limited. In this paper, we propose a knowledge-based prediction method that assigns a measure called the local match rate to each position of an amino acid sequence to estimate the confidence of our method. Empirically, the accuracy of the method correlates positively with the local match rate; therefore, we employ it to predict the local structures of positions with a high local match rate. For positions with a low local match rate, we propose a neural network prediction method. To better utilize the knowledge-based and neural network methods, we design a hybrid prediction method, HYPLOSP (HYbrid method to Protein LOcal Structure Prediction) that combines both methods. To evaluate the performance of the proposed methods, we first perform cross-validation experiments by applying our knowledge-based method, a neural network method, and HYPLOSP to a large dataset of 3,925 protein chains. We test our methods extensively on three different structural alphabets and evaluate their performance by two widely used criteria, Maximum Deviation of backbone torsion Angle (MDA) and Q(N), which is similar to Q(3) in secondary structure prediction. We then compare HYPLOSP with three previous studies using a dataset of 56 new protein chains. HYPLOSP shows promising results in terms of MDA and Q(N) accuracy and demonstrates its alphabet-independent capability.  相似文献   

4.
Four statistical methods are presented to determine the practical clinical value of measurements made from malignant tumors and to translate these measurements into a prediction of survival for each patient: the Cox statistical model, which must be derived from a data base of cases with known outcome; the null-rank test, a modified rank-sum test that provides an overall measure of the effectiveness of the Cox model; the predicted survival curve, an estimate of survival derived for each new patient from measurements of the primary tumor; and the standard error of measurement, an empirical method for estimating the variability introduced into predicted survival by errors in measurement. The value of these statistical methods was demonstrated by application to 200 cases of human intraocular melanoma, with the two predictive morphometric measurements used being the standard deviation of nucleolar area (SDNA) and the largest tumor dimension (LTD) derived from a single histologic slide of each tumor. Sufficient references and mathematical details are provided to allow anyone with moderate skills as a computer programmer to construct or obtain all of the relevant algorithms.  相似文献   

5.
OBJECTIVE: To develop an approach to the prediction of survival in patients with colorectal cancer using nearest neighbor analysis and case-based reasoning. STUDY DESIGN: A total of 216 patients with full clinicopathologic records and five-year follow-up were the subjects of this study. They were divided into a core database of 162 cases and a test group of 54 cases, with follow-up on all patients. When the patient was still alive at the end of the follow-up period, censored survival time was used. For each of the test cases, the four closest neighbors from the database were retrieved and their median survival time recorded and used as the predicted estimate of survival. Case matching was based on a Euclidean multivariate distance measure for the three best predictor variables: patient age, Dukes stage and tubule configuration. Cases with the smallest distance from the test case were considered to be the most similar. The predicted survival times for the test cases were compared with the actual, observed survival in the test cases to determine the success of this approach. RESULTS: The results showed reasonable concordance between observed and predicted survival figures, although there was a large degree of spread. Classification of cases into < or = 60 and > 60 months' survival showed a correct classification rate of 63%. For the prediction of survival time, the distribution of differences between observed and predicted survival times for the uncensored test cases had a median value of--5 months but also showed a wide dispersion of values. Correlation of observed and predicted survival times, while not reaching statistical significance at P < .05, did show a strong positive association. CONCLUSION: Case-based approaches to the prediction of survival times in cancer patients are important. The results of the current study illustrate the difficulties in applying this approach to survival data and highlight the complexity of patient information and the inability to accurately predict patient outcome on a small subset of clinicopathologic features. While extensive work needs to be carried out to improve prediction power, this study illustrates the potential for case-based analyses. The ability to retrieve feature-matched cases from hospital patient databases has clear, independent advantages in patient management, but the ability to provide reliable, targeted prognostic estimates on individual cases should be a common goal in medical research.  相似文献   

6.
A grand challenge in the proteomics and structural genomics era is the prediction of protein structure, including identification of those proteins that are partially or wholly unstructured. A number of predictors for identification of intrinsically disordered proteins (IDPs) have been developed over the last decade, but none can be taken as a fully reliable on its own. Using a single model for prediction is typically inadequate because prediction based on only the most accurate model ignores model uncertainty. In this paper, we present an empirical method to specify and measure uncertainty associated with disorder predictions. In particular, we analyze the uncertainty in the reference model itself and the uncertainty in data. This is achieved by training a set of models and developing several meta predictors on top of them. The best meta predictor achieved comparable or better results than any other single model, suggesting that incorporating different aspects of protein disorder prediction is important for the disorder prediction task. In addition, the best meta-predictor had more balanced sensitivity and specificity than any individual model. We also assessed the effects of changes in disorder prediction as a function of changes in the protein sequence. For collections of homologous sequences, we found that mutations caused many of the predicted disordered residues to be flipped to be predicted as ordered residues, while the reverse was observed much less frequently. These results suggest that disorder tendencies are more sensitive to allowed mutations than structure tendencies and the conservation of disorder is indeed less stable than conservation of structure. Availability: five meta-predictors and four single models developed for this study will be publicly freely accessible for non-commercial use.  相似文献   

7.
For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template‐based modeling of protein structure and have been incorporated into fragment‐based assembly methods. Our previous homology‐free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of ?,ψ backbone dihedral angles that are obtained from a Protein Data Bank‐based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position‐resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.  相似文献   

8.
《MABS-AUSTIN》2013,5(5):1178-1189
The development of biosimilar products is expected to grow rapidly over the next five years as a large number of approved biologics reach patent expiry. The pathway to regulatory approval requires that similarity of the biosimilar to the reference product be demonstrated through physiochemical and structural characterization, as well as within in vivo studies that compare the safety and efficacy profiles of the products. To support nonclinical and clinical studies pharmacokinetic (PK) assays are required to measure the biosimilar and reference products with comparable precision and accuracy. The most optimal approach is to develop a single PK assay, using a single analytical standard, for quantitative measurement of the biosimilar and reference products in serum matrix. Use of a single PK assay for quantification of multiple products requires a scientifically sound testing strategy to evaluate bioanalytical comparability of the test products within the method, and provide a solid data package to support the conclusions. To meet these objectives, a comprehensive approach with scientific rigor was applied to the development and characterization of PK assays that are used in support of biosimilar programs. Herein we describe the bioanalytical strategy and testing paradigm that has been used across several programs to determine bioanalytical comparability of the biosimilar and reference products. Data from one program is presented, with statistical results demonstrating the biosimilar and reference products were bioanalytically equivalent within the method. The cumulative work has established a framework for future biosimilar PK assay development.  相似文献   

9.
The development of biosimilar products is expected to grow rapidly over the next five years as a large number of approved biologics reach patent expiry. The pathway to regulatory approval requires that similarity of the biosimilar to the reference product be demonstrated through physiochemical and structural characterization, as well as within in vivo studies that compare the safety and efficacy profiles of the products. To support nonclinical and clinical studies pharmacokinetic (PK) assays are required to measure the biosimilar and reference products with comparable precision and accuracy. The most optimal approach is to develop a single PK assay, using a single analytical standard, for quantitative measurement of the biosimilar and reference products in serum matrix. Use of a single PK assay for quantification of multiple products requires a scientifically sound testing strategy to evaluate bioanalytical comparability of the test products within the method, and provide a solid data package to support the conclusions. To meet these objectives, a comprehensive approach with scientific rigor was applied to the development and characterization of PK assays that are used in support of biosimilar programs. Herein we describe the bioanalytical strategy and testing paradigm that has been used across several programs to determine bioanalytical comparability of the biosimilar and reference products. Data from one program is presented, with statistical results demonstrating the biosimilar and reference products were bioanalytically equivalent within the method. The cumulative work has established a framework for future biosimilar PK assay development.  相似文献   

10.
In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation, then rival strategies can still be compared based on repeated bootstraps of the same data. Often, however, the overall performance of rival strategies is similar and it is thus difficult to decide for one model. Here, we investigate the variability of the prediction models that results when the same modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same repeated bootstraps. A new decomposition of the expected Brier score is obtained, as well as the estimates of population average confidence scores. The latter can be used to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer studies, also with high-dimensional predictor space.  相似文献   

11.
Shao L  Fan X  Cheng N  Wu L  Xiong H  Fang H  Ding D  Shi L  Cheng Y  Tong W 《PloS one》2012,7(1):e29534
The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.  相似文献   

12.
The abundance of computer software for different types of prediction in DNA and protein sequence analyses raises the problem of adequate ranking of prediction program quality. A single measure of success of predictor software, which adequately ranks the predictors, does not exist. A typical example of such an incomplete measure is the so-called correlation coefficient. This paper provides an overview and short analysis of several different measures of prediction quality. Frequently, some of these measures give results contradictory to each other even when they relate to the same prediction scores.This may lead to confusion. In order to overcome some of the problems, a few new measures are proposed including some variants of a 'generalised distance from the ideal predictor score'; these are based on topological properties, rather than on statistics. In order to provide a sort of a balanced ranking, the averaged score measure (ASM) is introduced.The ASM provides a possibility for the selection of the predictor that probably has the best overall performance.The method presented in the paper applies to the ranking problem of any prediction software whose results can be properly represented in a true positive-false positive framework, thus providing a natural set-up for linear biological sequence analysis.  相似文献   

13.
He H  McAllister G  Smith TF 《Proteins》2002,48(4):654-663
We have constructed, in a completely automated fashion, a new structure template library for threading that represents 358 distinct SCOP folds where each model is mathematically represented as a Hidden Markov model (HMM). Because the large number of models in the library can potentially dilute the prediction measure, a new triage method for fold prediction is employed. In the first step of the triage method, the most probable structural class is predicted using a set of manually constructed, high-level, generalized structural HMMs that represent seven general protein structural classes: all-alpha, all-beta, alpha/beta, alpha+beta, irregular small metal-binding, transmembrane beta-barrel, and transmembrane alpha-helical. In the second step, only those fold models belonging to the determined structural class are selected for the final fold prediction. This triage method gave more predictions as well as more correct predictions compared with a simple prediction method that lacks the initial classification step. Two different schemes of assigning Bayesian model priors are presented and discussed.  相似文献   

14.
Cross-validation based point estimates of prediction accuracy are frequently reported in microarray class prediction problems. However these point estimates can be highly variable, particularly for small sample numbers, and it would be useful to provide confidence intervals of prediction accuracy. We performed an extensive study of existing confidence interval methods and compared their performance in terms of empirical coverage and width. We developed a bootstrap case cross-validation (BCCV) resampling scheme and defined several confidence interval methods using BCCV with and without bias-correction. The widely used approach of basing confidence intervals on an independent binomial assumption of the leave-one-out cross-validation errors results in serious under-coverage of the true prediction error. Two split-sample based methods previously proposed in the literature tend to give overly conservative confidence intervals. Using BCCV resampling, the percentile confidence interval method was also found to be overly conservative without bias-correction, while the bias corrected accelerated (BCa) interval method of Efron returns substantially anti-conservative confidence intervals. We propose a simple bias reduction on the BCCV percentile interval. The method provides mildly conservative inference under all circumstances studied and outperforms the other methods in microarray applications with small to moderate sample sizes.  相似文献   

15.

Background

Predicting the response to a drug for cancer disease patients based on genomic information is an important problem in modern clinical oncology. This problem occurs in part because many available drug sensitivity prediction algorithms do not consider better quality cancer cell lines and the adoption of new feature representations; both lead to the accurate prediction of drug responses. By predicting accurate drug responses to cancer, oncologists gain a more complete understanding of the effective treatments for each patient, which is a core goal in precision medicine.

Results

In this paper, we model cancer drug sensitivity as a link prediction, which is shown to be an effective technique. We evaluate our proposed link prediction algorithms and compare them with an existing drug sensitivity prediction approach based on clinical trial data. The experimental results based on the clinical trial data show the stability of our link prediction algorithms, which yield the highest area under the ROC curve (AUC) and are statistically significant.

Conclusions

We propose a link prediction approach to obtain new feature representation. Compared with an existing approach, the results show that incorporating the new feature representation to the link prediction algorithms has significantly improved the performance.
  相似文献   

16.
The purpose of this narrative review is to provide a critical reflection of how analytical machine learning approaches could provide the platform to harness variability of patient presentation to enhance clinical prediction. The review includes a summary of current knowledge on the physiological adaptations present in people with spinal pain. We discuss how contemporary evidence highlights the importance of not relying on single features when characterizing patients given the variability of physiological adaptations present in people with spinal pain. The advantages and disadvantages of current analytical strategies in contemporary basic science and epidemiological research are reviewed and we consider how analytical machine learning approaches could provide the platform to harness the variability of patient presentations to enhance clinical prediction of pain persistence or recurrence. We propose that machine learning techniques can be leveraged to translate a potentially heterogeneous set of variables into clinically useful information with the potential to enhance patient management.  相似文献   

17.
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.  相似文献   

18.
A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a final step, each threaded model is evaluated by a neural network in order to produce a single measure of confidence in the proposed prediction. The speed of the method, along with its sensitivity and very low false-positive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome.  相似文献   

19.
The capability of predicting folding and conformation of a protein from its primary structure is probably one of the main goals of modern biology. An accurate prediction of solvent accessibility is an intermediate step along this way. A new method for predicting solvent accessibility from single sequence and multiple alignment data is described. The method is based on probability profiles calculated on an amino acid sequence centred on the residue whose accessibility has to be predicted. A profile is constructed for each exposure category considered so as to calculate the probability of a sequence being generated by the different profiles. Prediction accuracy was tested on a variety of protein sets with two- and three-state models. Different thresholds were used according to those adopted by the authors proposing the data sets. The prediction accuracy is significantly improved over existing methods.  相似文献   

20.
Although Arabidopsis (Arabidopsis thaliana) is the best studied plant species, the biological role of one-third of its proteins is still unknown. We developed a probabilistic protein function prediction method that integrates information from sequences, protein-protein interactions, and gene expression. The method was applied to proteins from Arabidopsis. Evaluation of prediction performance showed that our method has improved performance compared with single source-based prediction approaches and two existing integration approaches. An innovative feature of our method is that it enables transfer of functional information between proteins that are not directly associated with each other. We provide novel function predictions for 5,807 proteins. Recent experimental studies confirmed several of the predictions. We highlight these in detail for proteins predicted to be involved in flowering and floral organ development.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号