首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Radiologists' interpretation on screening mammograms is measured by accuracy indices such as sensitivity and specificity. The hypothesis that radiologists' interpretation on screening mammograms is constant across time can be tested by measuring overdispersion. However, small sample sizes are problematic for the accuracy of asymptotic approaches. In this article, we propose an exact conditional distribution for testing overdispersion of the binomial assumption that is assumed for the accuracy indices. An exact p -value can be defined from the developed distribution. We also describe an algorithm for computing this exact test. This proposed method is applied to data from a study in reading screening mammograms in a population of US radiologists (Beam et al., 2003). The exact method is compared analytically with a currently available method based on large sample approximations.  相似文献   

2.
Gromiha MM  Suwa M 《Proteins》2006,63(4):1031-1037
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

3.
Distance-based methods for phylogeny reconstruction are the fastest and easiest to use, and their popularity is accordingly high. They are also the only known methods that can cope with huge datasets of thousands of sequences. These methods rely on evolutionary distance estimation and are sensitive to errors in such estimations. In this study, a novel Bayesian method for estimation of evolutionary distances is developed. The proposed method enables the use of a sophisticated evolutionary model that better accounts for among-site rate variation (ASRV), thereby improving the accuracy of distance estimation. Rate variations are estimated within a Bayesian framework by extracting information from the entire dataset of sequences, unlike standard methods that can only use one pair of sequences at a time. We compare the accuracy of a cascade of distance estimation methods, starting from commonly used methods and moving towards the more sophisticated novel method. Simulation studies show significant improvements in the accuracy of distance estimation by the novel method over the commonly used ones. We demonstrate the effect of the improved accuracy on tree reconstruction using both real and simulated protein sequence alignments. An implementation of this method is available as part of the SEMPHY package.  相似文献   

4.
The receiver operating characteristic curve is a popular tool to characterize the capabilities of diagnostic tests with continuous or ordinal responses. One common design for assessing the accuracy of diagnostic tests involves multiple readers and multiple tests, in which all readers read all test results from the same patients. This design is most commonly used in a radiology setting, where the results of diagnostic tests depend on a radiologist's subjective interpretation. The most widely used approach for analyzing data from such a study is the Dorfman-Berbaum-Metz (DBM) method (Dorfman et al., 1992) which utilizes a standard analysis of variance (ANOVA) model for the jackknife pseudovalues of the area under the ROC curves (AUCs). Although the DBM method has performed well in published simulation studies, there is no clear theoretical basis for this approach. In this paper, focusing on continuous outcomes, we investigate its theoretical basis. Our result indicates that the DBM method does not satisfy the regular assumptions for standard ANOVA models, and thus might lead to erroneous inference. We then propose a marginal model approach based on the AUCs which can adjust for covariates as well. Consistent and asymptotically normal estimators are derived for regression coefficients. We compare our approach with the DBM method via simulation and by an application to data from a breast cancer study. The simulation results show that both our method and the DBM method perform well when the accuracy of tests under the study is the same and that our method outperforms the DBM method for inference on individual AUCs when the accuracy of tests is not the same. The marginal model approach can be easily extended to ordinal outcomes.  相似文献   

5.
The process of inferring phylogenetic trees from molecular sequences almost always starts with a multiple alignment of these sequences but can also be based on methods that do not involve multiple sequence alignment. Very little is known about the accuracy with which such alignment-free methods recover the correct phylogeny or about the potential for increasing their accuracy. We conducted a large-scale comparison of ten alignment-free methods, among them one new approach that does not calculate distances and a faster variant of our pattern-based approach; all distance-based alignment-free methods are freely available from http://www.bioinformatics.org.au (as Python package decaf+py). We show that most methods exhibit a higher overall reconstruction accuracy in the presence of high among-site rate variation. Under all conditions that we considered, variants of the pattern-based approach were significantly better than the other alignment-free methods. The new pattern-based variant achieved a speed-up of an order of magnitude in the distance calculation step, accompanied by a small loss of tree reconstruction accuracy. A method of Bayesian inference from k-mers did not improve on classical alignment-free (and distance-based) methods but may still offer other advantages due to its Bayesian nature. We found the optimal word length k of word-based methods to be stable across various data sets, and we provide parameter ranges for two different alphabets. The influence of these alphabets was analyzed to reveal a trade-off in reconstruction accuracy between long and short branches. We have mapped the phylogenetic accuracy for many alignment-free methods, among them several recently introduced ones, and increased our understanding of their behavior in response to biologically important parameters. In all experiments, the pattern-based approach emerged as superior, at the expense of higher resource consumption. Nonetheless, no alignment-free method that we examined recovers the correct phylogeny as accurately as does an approach based on maximum-likelihood distance estimates of multiply aligned sequences.  相似文献   

6.

Background

Many cell lines currently used in medical research, such as cancer cells or stem cells, grow in confluent sheets or colonies. The biology of individual cells provide valuable information, thus the separation of touching cells in these microscopy images is critical for counting, identification and measurement of individual cells. Over-segmentation of single cells continues to be a major problem for methods based on morphological watershed due to the high level of noise in microscopy cell images. There is a need for a new segmentation method that is robust over a wide variety of biological images and can accurately separate individual cells even in challenging datasets such as confluent sheets or colonies.

Results

We present a new automated segmentation method called FogBank that accurately separates cells when confluent and touching each other. This technique is successfully applied to phase contrast, bright field, fluorescence microscopy and binary images. The method is based on morphological watershed principles with two new features to improve accuracy and minimize over-segmentation.First, FogBank uses histogram binning to quantize pixel intensities which minimizes the image noise that causes over-segmentation. Second, FogBank uses a geodesic distance mask derived from raw images to detect the shapes of individual cells, in contrast to the more linear cell edges that other watershed-like algorithms produce.We evaluated the segmentation accuracy against manually segmented datasets using two metrics. FogBank achieved segmentation accuracy on the order of 0.75 (1 being a perfect match). We compared our method with other available segmentation techniques in term of achieved performance over the reference data sets. FogBank outperformed all related algorithms. The accuracy has also been visually verified on data sets with 14 cell lines across 3 imaging modalities leading to 876 segmentation evaluation images.

Conclusions

FogBank produces single cell segmentation from confluent cell sheets with high accuracy. It can be applied to microscopy images of multiple cell lines and a variety of imaging modalities. The code for the segmentation method is available as open-source and includes a Graphical User Interface for user friendly execution.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0431-x) contains supplementary material, which is available to authorized users.  相似文献   

7.
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the influence of physico-chemical, energetic and conformational properties of amino acid residues for discriminating outer membrane proteins using different machine learning algorithms, such as, Bayes rules, Logistic functions, Neural networks, Support vector machines, Decision trees, etc. We observed that most of the properties have discriminated the OMPs with similar accuracy. The neural network method with the property, free energy change could discriminate the OMPs from other folding types of globular and membrane proteins at the 5-fold cross-validation accuracy of 94.4% in a dataset of 1,088 proteins, which is better than that obtained with amino acid composition. The accuracy of discriminating globular proteins is 94.3% and that of transmembrane helical (TMH) proteins is 91.8%. Further, the neural network method is tested with globular proteins belonging to 30 major folding types and it could successfully exclude 99.4% of the considered 1612 non-redundant proteins. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

8.
Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhood's annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all).  相似文献   

9.
《Genomics》2019,111(6):1574-1582
Given the vast amount of genomic data, alignment-free sequence comparison methods are required due to their low computational complexity. k-mer based methods can improve comparison accuracy by extracting an effective feature of the genome sequences. The aim of this paper is to extract k-mer intervals of a sequence as a feature of a genome for high comparison accuracy. In the proposed method, we calculated the distance between genome sequences by comparing the distribution of k-mer intervals. Then, we identified the classification results using phylogenetic trees. We used viral, mitochondrial (MT), microbial and mammalian genome sequences to perform classification for various genome sets. We confirmed that the proposed method provides a better classification result than other k-mer based methods. Furthermore, the proposed method could efficiently be applied to long sequences such as human and mouse genomes.  相似文献   

10.
Chen GB  Xu Y  Xu HM  Li MD  Zhu J  Lou XY 《PloS one》2011,6(2):e16981
Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50~0.65) reported in the literature. The GMDR with covariate adjustment had a power of >80% in a case-control design with a sample size of ≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was <0.56, a sample size of ≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56~0.62 for a sample size of 1000-2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000~2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy <0.56.  相似文献   

11.

Background

Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.

Results

We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.

Conclusions

SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.  相似文献   

12.
Wavelet change-point prediction of transmembrane proteins   总被引:3,自引:0,他引:3  
MOTIVATION: A non-parametric method, based on a wavelet data-dependent threshold technique for change-point analysis, is applied to predict location and topology of helices in transmembrane proteins. A new propensity scale generated from a transmembrane helix database is proposed. RESULTS: We show that wavelet change-point performs well for smoothing hydropathy and transmembrane profiles generated using different scales. We investigate which wavelet bases and threshold functions are overall most appropriate to detect transmembrane segments. Prediction accuracy is based on the analysis of two data sets used as standard benchmarks for transmembrane prediction algorithms. The analysis of a test set of 83 proteins results in accuracy per segment equal to 98.2%; the analysis of a 48 proteins blind-test set, i.e. containing proteins not used to generate the propensity scales, results in accuracy per segment equal to 97.4%. We believe that this method can also be applied to the detection of boundaries of other patterns such as G + Cisochores and dot-plots. AVAILABILITY: The transmembrane database, TMALN and source code are available upon request from the authors.  相似文献   

13.
Zheng Y  Cai T  Feng Z 《Biometrics》2006,62(1):279-287
The rapid advancement in molecule technology has led to the discovery of many markers that have potential applications in disease diagnosis and prognosis. In a prospective cohort study, information on a panel of biomarkers as well as the disease status for a patient are routinely collected over time. Such information is useful to predict patients' prognosis and select patients for targeted therapy. In this article, we develop procedures for constructing a composite test with optimal discrimination power when there are multiple markers available to assist in prediction and characterize the accuracy of the resulting test by extending the time-dependent receiver operating characteristic (ROC) curve methodology. We employ a modified logistic regression model to derive optimal linear composite scores such that their corresponding ROC curves are maximized at every false positive rate. We provide theoretical justification for using such a model for prognostic accuracy. The proposed method allows for time-varying marker effects and accommodates censored failure time outcome. When the effects of markers are approximately constant over time, we propose a more efficient estimating procedure under such models. We conduct numerical studies to evaluate the performance of the proposed procedures. Our results indicate the proposed methods are both flexible and efficient. We contrast these methods with an application concerning the prognostic accuracies of expression levels of six genes.  相似文献   

14.
We provide an approach to testing whether the accuracy of a binary diagnostic test, which we define as the sum of sensitivity and specificity, is significantly better than chance. We derive an exact confidence interval of size at least 1 — α for the observed accuracy of the test. In addition, we develop tests to compare the accuracy of two such tests applied to the same subjects. These results offer a method for assessing the accuracy of a test at a single test criterion, in contrast to the standard approach of evaluating the total receiver-operating characteristic (ROC) curve for a test.  相似文献   

15.
Organotypic, three dimensional (3D) cell culture models of epithelial tumour types such as prostate cancer recapitulate key aspects of the architecture and histology of solid cancers. Morphometric analysis of multicellular 3D organoids is particularly important when additional components such as the extracellular matrix and tumour microenvironment are included in the model. The complexity of such models has so far limited their successful implementation. There is a great need for automatic, accurate and robust image segmentation tools to facilitate the analysis of such biologically relevant 3D cell culture models. We present a segmentation method based on Markov random fields (MRFs) and illustrate our method using 3D stack image data from an organotypic 3D model of prostate cancer cells co-cultured with cancer-associated fibroblasts (CAFs). The 3D segmentation output suggests that these cell types are in physical contact with each other within the model, which has important implications for tumour biology. Segmentation performance is quantified using ground truth labels and we show how each step of our method increases segmentation accuracy. We provide the ground truth labels along with the image data and code. Using independent image data we show that our segmentation method is also more generally applicable to other types of cellular microscopy and not only limited to fluorescence microscopy.  相似文献   

16.
基于知识库的像斑光谱向量相似度土地覆盖变化检测方法   总被引:1,自引:0,他引:1  
宋翔  颜长珍 《生态学报》2014,34(24):7175-7180
土地利用/覆盖变化检测是国内外全球化进程研究的重要内容,选择适当的变化检测方法对西北地区土地利用/覆盖变化进行研究在"生态十年项目"中具有重要的意义。选择西北地区具有典型代表性的TM轨道号134033区域作为变化检测方法验证的试验区,采用2005和2010年两期Landsat TM影像,在e Cognition Developer 8.64软件支持下,采用基于像斑的光谱特征特征向量相似度方法进行变化检测,并利用2010年土地覆盖数据作为先验知识库对变化区域分类,提取土地利用/覆盖变化信息,并对变化结果进行定量分析。结果表明,采用基于像斑的光谱特征特征向量相似度方法对于试验区的土地利用/覆盖变化制图具有检测快速、检测精度高等优点,适合试验区以及整个西北地区的土地利用/覆盖变化的检测。最终采用该方法以及分类后比较法获得了西北地区2000—2010年近10年的土地利用/覆盖分类图。  相似文献   

17.
Vocal individuality has been documented in a variety of mammalian species and it has been proposed that this individuality can be used as a vocal fingerprint to monitor individuals. Here we provide and test a classification method using Mel-frequency cepstral coefficients (MFCCs) to extract features from Bornean gibbon female calls. Our method is semi-automated as it requires manual pre-processing to identify and extract calls from the original recordings. We compared two methods of MFCC feature extraction: (1) averaging across all time windows and (2) creating a standardized number of time windows for each call. We analysed 376 calls from 33 gibbon females and, using linear discriminant analysis, found that we were able to improve female identification accuracy from 95.7% with spectrogram features to 98.4% accuracy when averaging MFCCs across time windows, and 98.9% accuracy when using a standardized number of windows. We divided our data randomly into training and test data-sets, and tested the accuracy of support vector machine (SVM) predictions over 100 iterations. We found that we could predict female identity in the test data-set with a 98.8% accuracy. Using SVM on our entire data-set, we were able to predict female identity with 99.5% accuracy (validated by leave-one-out cross-validation). Lastly, we used the method presented here to classify four females recorded during three or more recording seasons using SVM with limited success. We provide evidence that MFCC feature extraction is effective for distinguishing between female Bornean gibbons, and make suggestions for future vocal fingerprinting applications.  相似文献   

18.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

19.
20.
In areas such as drug development, clinical diagnosis and biotechnology research, acquiring details about the kinetic parameters of enzymes is crucial. The correct design of an experiment is critical to collecting data suitable for analysis, modelling and deriving the correct information. As classical design methods are not targeted to the more complex kinetics being frequently studied, attention is needed to estimate parameters of such models with low variance. We demonstrate that a Bayesian approach (the use of prior knowledge) can produce major gains quantifiable in terms of information, productivity and accuracy of each experiment. Developing the use of Bayesian Utility functions, we have used a systematic method to identify the optimum experimental designs for a number of kinetic model data sets. This has enabled the identification of trends between kinetic model types, sets of design rules and the key conclusion that such designs should be based on some prior knowledge of K(M) and/or the kinetic model. We suggest an optimal and iterative method for selecting features of the design such as the substrate range, number of measurements and choice of intermediate points. The final design collects data suitable for accurate modelling and analysis and minimises the error in the parameters estimated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号