首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Robust density estimation using distance methods   总被引:2,自引:0,他引:2  
DIGGLE  PETER J. 《Biometrika》1975,62(1):39-48
  相似文献   

2.
Species- and individual-specific animal calls can be used in identification as verified in playback experiments and analyses of features extracted from these signals. The use of machine-learning methods and acoustic features borrowed from human speech recognition to identify animals at the species and individual level has increased recently. To date there have been few studies comparing the performances of these methods and features used for call-type-independent species and individual identification. We compared the performance of four machine-learning classifiers in the identification of ten passerine species, and individual identification for three passerines using two acoustic features. The methods did not require us to pre-categorize the component syllables in call-type-independent species and individual identification systems. The results of our experiment indicated that support vector machines (SVM) performed best generally, regardless of which acoustic feature was used, linear predictive coefficients (LPCs) increased the recognition accuracies of hidden Markov models (HMM) greatly, and the most appropriate classifiers for LPCs and Mel-frequency cepstral coefficients (MFCCs) were HMM and SVM respectively. This study will assist researchers in selecting classifiers and features to use in future species and individual recognition studies.  相似文献   

3.
Wu S  Zhang Y 《PloS one》2008,3(10):e3400
We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28 degrees/46 degrees , which is approximately 10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0 x 10(-300) (or <1.0 x 10(-148)) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10-20 degrees ) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling.  相似文献   

4.

Background  

There are several techniques for fitting risk prediction models to high-dimensional data, arising from microarrays. However, the biological knowledge about relations between genes is only rarely taken into account. One recent approach incorporates pathway information, available, e.g., from the KEGG database, by augmenting the penalty term in Lasso estimation for continuous response models.  相似文献   

5.
Classification is one of the most widely applied tasks in ecology. Ecologists have to deal with noisy, high-dimensional data that often are non-linear and do not meet the assumptions of conventional statistical procedures. To overcome this problem, machine-learning methods have been adopted as ecological classification methods. We compared five machine-learning based classification techniques (classification trees, random forests, artificial neural networks, support vector machines, and automatically induced rule-based fuzzy models) in a biological conservation context. The study case was that of the ocellated turkey (Meleagris ocellata), a bird endemic to the Yucatan peninsula that has suffered considerable decreases in local abundance and distributional area during the last few decades. On a grid of 10 × 10 km cells that was superimposed to the peninsula we analysed relationships between environmental and social explanatory variables and ocellated turkey abundance changes between 1980 and 2000. Abundance was expressed in three (decrease, no change, and increase) and 14 more detailed abundance change classes, respectively. Modelling performance varied considerably between methods with random forests and classification trees being the most efficient ones as measured by overall classification error and the normalised mutual information index. Artificial neural networks yielded the worst results along with linear discriminant analysis, which was included as a conventional statistical approach. We not only evaluated classification accuracy but also characteristics such as time effort, classifier comprehensibility and method intricacy—aspects that determine the success of a classification technique among ecologists and conservation biologists as well as for the communication with managers and decision makers. We recommend the combined use of classification trees and random forests due to the easy interpretability of classifiers and the high comprehensibility of the method.  相似文献   

6.
Huang Y  Pepe MS 《Biometrika》2009,96(4):991-997
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods.  相似文献   

7.
Bioprocess and Biosystems Engineering - An accurate and reliable forecast of biosurfactant production with minimum error is useful in any bioprocess engineering. Bacterial isolate FKOD36 capable of...  相似文献   

8.
Contact-tracing data (CTD) collected from disease outbreaks has received relatively little attention in the epidemic modeling literature because it is thought to be unreliable: infection sources might be wrongly attributed, or data might be missing due to resource constraints in the questionnaire exercise. Nevertheless, these data might provide a rich source of information on the disease transmission rate. This paper presents a novel methodology for combining CTD with rate-based contact network data to improve posterior precision, and therefore predictive accuracy. We present an advancement in Bayesian inference for epidemics that assimilates these data and is robust to partial contact tracing. Using a simulation study based on the British poultry industry, we show how the presence of CTD improves posterior predictive accuracy and can directly inform a more effective control strategy.  相似文献   

9.
OBJECTIVE: To study the reliability of volume parameter measured on tissue sections through different sampling, measurement and calculation methods. STUDY DESIGN: The largest nuclear profile image under a 100x, NA 1.30 oil immersion objective of primary spermatocytes and spherical spermatoblasts on 11-micron-thick seminiferous tubule sections and tissue images, under a 20x objective, on 4-micron sections were captured. Their volumes were measured and calculated by the five methods provided by the Technology for Image and Graphics Engineering Research cell image analysis system. RESULTS: The nuclear volumes obtained by nucleator and area equivalent diameter on the largest nuclear profile image were almost the same, including binary images by automated and manual interactive nucleator and grey scale images only by the latter. Nuclear volumes, calculated by random Feret diameter and equivalent diameter of the perimeter, the minimal circumference of the largest nuclear profile binary image, were obviously larger than those of the nucleator and area equivalent diameter. Due to different-sized nuclear slices entrapped in the same section, those nuclear volumes from the seminiferous tubule tissue images were strikingly lower than that of the largest nuclei profile image. The shape factors of primary spermatocytes and spherical spermatoblast nuclei under 100x and 20x objectives were approximately the same. CONCLUSION: The sample preparation, sampling methods and calculation formulas suitable to nuclear form are necessary to obtain reproducible volume parameters.  相似文献   

10.
11.
The results of the cluster analysis of fermentation data are used for the supervision and on-line state estimation. The results of the classification are presented as the average over all fermentation runs belonging to the class as well as the standard deviation. With the help of the class information the on-line fermentation is associated with the best suiting class. Faults in the data such as spikes or total failure of the sensors are detected as the class information automatically supplies tolerance regions for the measurements. In case of a fault a reliable extrapolation for the time of the fault can be calculated. The approach is implemented in the real-time expert system tool G2 and is applied to data of the carbon dioxide evolution rate (CER) of an industrial antibiotic fermentation process.  相似文献   

12.

Background  

Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities.  相似文献   

13.
The detailed spectral changes observed in the absorption, circular dichroism (CD) and magnetic circular dichroism (MCD) spectra upon addition of Cd2+ to rat liver Cd, Zn-metallothionein (MT) are reported. Results from dialysis experiments clearly demonstrate that up to 8.6 mole equivalents of Cd2+ can be bound to this protein. The excess Cd2+ ions bound appear to have lower binding constants than those of the first seven Cd2+ ions bound. Red blood cell hemolysate (RBC) can compete with the metallothionein for all Cd2+ bound in excess of seven mole equivalents. Thus the RBC hemolysate method of estimating protein concentrations is shown to be correct when based upon complete loading of all binding sites in MT with Cd2+.  相似文献   

14.
Membrane protein prediction methods   总被引:13,自引:0,他引:13  
We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and highlight the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identification, transmembrane segment and protein topology predictions, homology and ab initio modeling. In general, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experimental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and effort will be required in the future to ameliorate database search, homology and ab initio modeling.  相似文献   

15.
Generalised absolute risk models were fitted to the latest Japanese atomic bomb survivor cancer incidence data using Bayesian Markov Chain Monte Carlo methods, taking account of random errors in the DS86 dose estimates. The resulting uncertainty distributions in the relative risk model parameters were used to derive uncertainties in population cancer risks for a current UK population. Because of evidence for irregularities in the low-dose dose response, flexible dose-response models were used, consisting of a linear-quadratic-exponential model, used to model the high-dose part of the dose response, together with piecewise-linear adjustments for the two lowest dose groups. Following an assumed administered dose of 0.001 Sv, lifetime leukaemia radiation-induced incidence risks were estimated to be 1.11 x 10(-2) Sv(-1) (95% Bayesian CI -0.61, 2.38) using this model. Following an assumed administered dose of 0.001 Sv, lifetime solid cancer radiation-induced incidence risks were calculated to be 7.28 x 10(-2) Sv(-1) (95% Bayesian CI -10.63, 22.10) using this model. Overall, cancer incidence risks predicted by Bayesian Markov Chain Monte Carlo methods are similar to those derived by classical likelihood-based methods and which form the basis of established estimates of radiation-induced cancer risk.  相似文献   

16.
EMG-driven models can be used to estimate muscle force in biomechanical systems. Collected and processed EMG readings are used as the input of a dynamic system, which is integrated numerically. This approach requires the definition of a reasonably large set of parameters. Some of these vary widely among subjects, and slight inaccuracies in such parameters can lead to large model output errors. One of these parameters is the maximum voluntary contraction force (Fom). This paper proposes an approach to find Fom by estimating muscle physiological cross-sectional area (PCSA) using ultrasound (US), which is multiplied by a realistic value of maximum muscle specific tension. Ultrasound is used to measure muscle thickness, which allows for the determination of muscle volume through regression equations. Soleus, gastrocnemius medialis and gastrocnemius lateralis PCSAs are estimated using published volume proportions among leg muscles, which also requires measurements of muscle fiber length and pennation angle by US. Fom obtained by this approach and from data widely cited in the literature was used to comparatively test a Hill-type EMG-driven model of the ankle joint. The model uses 3 EMGs (Soleus, gastrocnemius medialis and gastrocnemius lateralis) as inputs with joint torque as the output. The EMG signals were obtained in a series of experiments carried out with 8 adult male subjects, who performed an isometric contraction protocol consisting of 10 s step contractions at 20% and 60% of the maximum voluntary contraction level. Isometric torque was simultaneously collected using a dynamometer. A statistically significant reduction in the root mean square error was observed when US-obtained Fom was used, as compared to Fom from the literature.  相似文献   

17.
Plant promoter prediction with confidence estimation   总被引:10,自引:0,他引:10       下载免费PDF全文
  相似文献   

18.
Estimating species richness through extrapolation is becomingincreasingly important for conservation decision making. We present the resultsof a first test of four abundance-based estimation procedures, ACE, Chao1, Lognormal and Poisson lognormal based on single-sample museum collection data consisting of more than 150000specimens of 47 families of Danish Diptera. All four estimators considerablyunderestimate true species richness as assessed by species distributions, expertopinions, and a species–area curve. In our samples 3326 species wererepresented. The different estimators predicted the Danish fauna to consist of3490–3805 species, although at least 4361 are already known from theliterature. Expert opinion and the species–area curve indicate that theDanish fauna likely contains 5400–5800 species. The Poisson lognormalmethod displays a rather erratic behavior, but nonetheless performs slightlybetter than the other estimators. We discuss the inherent problems concerningthe use of collection data in this context as well as the influence of patchydistributions and sample size on estimator performance. We conclude thatabundance-based estimators should preferably be applied to almost completesamples of randomly distributed organisms.  相似文献   

19.
Genotypic diversity: estimation and prediction in samples   总被引:10,自引:1,他引:10  
Stoddart JA  Taylor JF 《Genetics》1988,118(4):705-711
We show that a commonly used statistic of genotypic diversity can be used to reflect one form of deviation from panmixia, viz. clonal reproduction, by comparing observed and predicted sample statistics. The characteristics of the statistic, in particular its relationship with population genotypic diversity, are formalised and a method of predicting the genotypic diversity of a sample drawn from a panmictic population using allelic frequencies and sample size is developed. The sensitivity of some possible tests of significance of the deviation from panmictic expectations is examined using computer simulations. Goodness-of-fit tests are robust but produce an unacceptably high level of type II error. With means and variances calculated either from Monte Carlo simulations or from distributional and series approximations, t-tests perform better than goodness-of-fit tests. Under simulation, both forms of t-test exhibit acceptable rates of type I error. Rates of type II are usually large when allele frequencies are severely skewed although the latter test performs the better in those conditions.  相似文献   

20.
Calculations of charge interactions complement analysis of a characterised active site, rationalising pH-dependence of activity and transition state stabilisation. Prediction of active site location through large DeltapK(a)s or electrostatic strain is relevant for structural genomics. We report a study of ionisable groups in a set of 20 enzymes, finding that false positives obscure predictive potential. In a larger set of 156 enzymes, peaks in solvent-space electrostatic properties are calculated. Both electric field and potential match well to active site location. The best correlation is found with electrostatic potential calculated from uniform charge density over enzyme volume, rather than from assignment of a standard atom-specific charge set. Studying a shell around each molecule, for 77% of enzymes the potential peak is within that 5% of the shell closest to the active site centre, and 86% within 10%. Active site identification by largest cleft, also with projection onto a shell, gives 58% of enzymes for which the centre of the largest cleft lies within 5% of the active site, and 70% within 10%. Dielectric boundary conditions emphasise clefts in the uniform charge density method, which is suited to recognition of binding pockets embedded within larger clefts. The variation of peak potential with distance from active site, and comparison between enzyme and non-enzyme sets, gives an optimal threshold distinguishing enzyme from non-enzyme. We find that 87% of the enzyme set exceeds the threshold as compared to 29% of the non-enzyme set. Enzyme/non-enzyme homologues, structural genomics annotated proteins and catalytic/non-catalytic RNAs are studied in this context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号