首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Neural networks are considered by many to be very promising tools for classification and prediction. The flexibility of the neural network models often result in over-fit. Shrinking the parameters using a penalized likelihood is often used in order to overcome such over-fit. In this paper we extend the approach proposed by FARAGGI and SIMON (1995a) to modeling censored survival data using the input-output relationship associated with a single hidden layer feed-forward neural network. Instead of estimating the neural network parameters using the method of maximum likelihood, we place normal prior distributions on the parameters and make inferences based on derived posterior distributions of the parameters. This Bayesian formulation will result in shrinking the parameters of the neural network model and will reduce the over-fit compared with the maximum likelihood estimators. We illustrate our proposed method on a simulated and a real example.  相似文献   

2.
Currently, traditional Mediterranean trawls are generally made with non‐selective netting and the fishing boats are involved in multi‐species fisheries. As a result, most near‐shore stocks are over‐exploited. Weather permitting, the demersal trawl fleet tends to fish in relatively deeper, international waters of the Aegean Sea, where the catch is usually higher. Therefore, the need for evaluation of the codends used in this fishery and the potential improvements to their selectivity are of prime importance. In the present study, selectivity data were collected for hake (Merluccius merluccius), blue whiting (Micromesistius poutassou), greater forkbeard (Phycis blennoides), blackbelly rosefish (Helicolenus dactylopterus dactylopterus) and fourspotted megrim (Lepidorhombus boscii) in commercial (300 MC) and square mesh top panel (SMTPC) codends. Trawling was carried out at depths of 274–426 m onboard a commercial vessel chartered for a 15‐day sea trial in August 2004. Selection parameters were obtained by fitting a logistic equation using a maximum likelihood method. Results of the selectivity analysis indicated that the commercially used 40 mm nominal mesh size PE codend was rather unselective for the species investigated in this study. In general, the square mesh top panel codend has relatively higher L50 values than the commercial codend. However, except for blue whiting, even this codend is rather unselective when 50% maturity lengths (LM50) are considered.  相似文献   

3.
The effects of different body shapes on size selectivity were analysed. Relationships between total length and fork length, height, width and girth were estimated and the selectivity parameters of a 44 mm PE diamond mesh codend determined for common pandora (Pagellus erythrinus) and axillary sea bream (Pagellus acarne). Two data sets were collected during demersal trawl trials carried out in Izmir Bay (Aegean Sea) between May 1996 and February 1997. Regression analyses were used to determine the relationships between total length and other measured dimensions of the species. However, selectivity parameters were estimated from pooled data by using the logistic equation with the maximum likelihood method. Fork length, height, width and girth were found to increase linearly with total length for both species (R2 > 0.90, except TL–W for axillary sea bream, 0.76). Ratios of average body thickness (W/H) were 0.45 (±0.002) for common pandora and 0.52 (±0.002) for axillary sea bream. L50 and SR values were found as 12.4 (±0.44) and 2.2 (±0.51) from the eight valid hauls for common pandora and as 13.6 (±0.13) and 1.9 (±0.26) from the three valid hauls for axillary sea bream, respectively. The difference between the size selectivity of the two species in the same family can be explained by the body shape and fish behaviour distinctions.  相似文献   

4.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

5.
ABSTRACT: BACKGROUND: Linkage analysis is a useful tool for detecting genetic variants that regulate a trait of interest, especially genes associated with a given disease. Although penetrance parameters play an important role in determining gene location, they are assigned arbitrary values according to the researcher's intuition or as estimated by the maximum likelihood principle. Several methods exist by which to evaluate the maximum likelihood estimates of penetrance, although not all of these are supported by software packages and some are biased by marker genotype information, even when disease development is due solely to the genotype of a single allele. FINDINGS: Programs for exploring the maximum likelihood estimates of penetrance parameters were developed using the R statistical programming language supplemented by external C functions. The software returns a vector of polynomial coefficients of penetrance parameters, representing the likelihood of pedigree data. From the likelihood polynomial supplied by the proposed method, the likelihood value and its gradient can be precisely computed. To reduce the effect of the supplied dataset on the likelihood function, feasible parameter constraints can be introduced into maximum likelihood estimates, thus enabling flexible exploration of the penetrance estimates. An auxiliary program generates a perspective plot allowing visual validation of the model's convergence. The functions are collectively available as the MLEP R package. CONCLUSIONS: Linkage analysis using penetrance parameters estimated by the MLEP package enables feasible localization of a disease locus. This is shown through a simulation study and by demonstrating how the package is used to explore maximum likelihood estimates. Although the input dataset tends to bias the likelihood estimates, the method yields accurate results superior to the analysis using intuitive penetrance values for disease with low allele frequencies. MLEP is part of the Comprehensive R Archive Network and is freely available at http://cran.r-project.org/web/packages/MLEP/index.html.  相似文献   

6.
In quantitative biology, observed data are fitted to a model that captures the essence of the system under investigation in order to obtain estimates of the parameters of the model, as well as their standard errors and interactions. The fitting is best done by the method of maximum likelihood, though least-squares fits are often used as an approximation because the calculations are perceived to be simpler. Here Brian Williams and Chris Dye argue that the method of maximum likelihood is generally preferable to least squares giving the best estimates of the parameters for data with any given error distribution, and the calculations are no more difficult than for least-squares fitting. They offer a relatively simple explanation of the methods and describe its implementation using examples from leishmaniasis epidemiology.  相似文献   

7.
The Poisson regression model for the analysis of life table and follow-up data with covariates is presented. An example is presented to show how this technique can be used to construct a parsimonious model which describes a set of survival data. All parameters in the model, the hazard and survival functions are estimated by maximum likelihood.  相似文献   

8.
Algorithmic details to obtain maximum likelihood estimates of parameters on a large phylogeny are discussed. On a large tree, an efficient approach is to optimize branch lengths one at a time while updating parameters in the substitution model simultaneously. Codon substitution models that allow for variable nonsynonymous/synonymous rate ratios (ω=d N/d S) among sites are used to analyze a data set of human influenza virus type A hemagglutinin (HA) genes. The data set has 349 sequences. Methods for obtaining approximate estimates of branch lengths for codon models are explored, and the estimates are used to test for positive selection and to identify sites under selection. Compared with results obtained from the exact method estimating all parameters by maximum likelihood, the approximate methods produced reliable results. The analysis identified a number of sites in the viral gene under diversifying Darwinian selection and demonstrated the importance of including many sequences in the data in detecting positive selection at individual sites. Received: 25 April 2000 / Accepted: 24 July 2000  相似文献   

9.
10.
Xue  Liugen; Zhu  Lixing 《Biometrika》2007,94(4):921-937
A semiparametric regression model for longitudinal data is considered.The empirical likelihood method is used to estimate the regressioncoefficients and the baseline function, and to construct confidenceregions and intervals. It is proved that the maximum empiricallikelihood estimator of the regression coefficients achievesasymptotic efficiency and the estimator of the baseline functionattains asymptotic normality when a bias correction is made.Two calibrated empirical likelihood approaches to inferencefor the baseline function are developed. We propose a groupwiseempirical likelihood procedure to handle the inter-series dependencefor the longitudinal semiparametric regression model, and employbias correction to construct the empirical likelihood ratiofunctions for the parameters of interest. This leads us to provea nonparametric version of Wilks' theorem. Compared with methodsbased on normal approximations, the empirical likelihood doesnot require consistent estimators for the asymptotic varianceand bias. A simulation compares the empirical likelihood andnormal-based methods in terms of coverage accuracies and averageareas/lengths of confidence regions/intervals.  相似文献   

11.
Summary We estimate the parameters of a stochastic process model for a macroparasite population within a host using approximate Bayesian computation (ABC). The immunity of the host is an unobserved model variable and only mature macroparasites at sacrifice of the host are counted. With very limited data, process rates are inferred reasonably precisely. Modeling involves a three variable Markov process for which the observed data likelihood is computationally intractable. ABC methods are particularly useful when the likelihood is analytically or computationally intractable. The ABC algorithm we present is based on sequential Monte Carlo, is adaptive in nature, and overcomes some drawbacks of previous approaches to ABC. The algorithm is validated on a test example involving simulated data from an autologistic model before being used to infer parameters of the Markov process model for experimental data. The fitted model explains the observed extra‐binomial variation in terms of a zero‐one immunity variable, which has a short‐lived presence in the host.  相似文献   

12.
Distribution-free regression analysis of grouped survival data   总被引:1,自引:0,他引:1  
Methods based on regression models for logarithmic hazard functions, Cox models, are given for analysis of grouped and censored survival data. By making an approximation it is possible to obtain explicitly a maximum likelihood function involving only the regression parameters. This likelihood function is a convenient analog to Cox's partial likelihood for ungrouped data. The method is applied to data from a toxicological experiment.  相似文献   

13.
Gene regulatory, signal transduction and metabolic networks are major areas of interest in the newly emerging field of systems biology. In living cells, stochastic dynamics play an important role; however, the kinetic parameters of biochemical reactions necessary for modelling these processes are often not accessible directly through experiments. The problem of estimating stochastic reaction constants from molecule count data measured, with error, at discrete time points is considered. For modelling the system, a hidden Markov process is used, where the hidden states are the true molecule counts, and the transitions between those states correspond to reaction events following collisions of molecules. Two different algorithms are proposed for estimating the unknown model parameters. The first is an approximate maximum likelihood method that gives good estimates of the reaction parameters in systems with few possible reactions in each sampling interval. The second algorithm, treating the data as exact measurements, approximates the number of reactions in each sampling interval by solving a simple linear equation. Maximising the likelihood based on these approximations can provide good results, even in complex reaction systems.  相似文献   

14.
Zero‐truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well‐known software packages, and additional programming is often required. Motivated by the Rao–Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero‐truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.  相似文献   

15.
Tai JJ  Hsiao CK 《Human heredity》2001,51(4):192-198
In human genetic analysis, data are collected through the so-called 'ascertainment procedure'. Statistically this sampling scheme can be thought of as a multistage sampling method. At the first stage, one or several probands are ascertained. At the subsequent stages, a sequential sampling scheme is applied. Sampling in such a way is virtually a nonrandom procedure, which, in most cases, causes biased estimation which may be intractable. This paper focuses on the underlying causes of the intractability problem of ascertained genetic data. Three types of parameters, i.e. target, design and nuisance parameters, are defined as the essences to formulate the true likelihood of a set of data. These parameters are also classified into explicit or implicit parameters depending on whether they can be expressed explicity in the likelihood function. For ascertained genetic data, a sequential scheme is regarded as an implicit design parameter, and a true pedigree structure as an implicit nuisance parameter. The intractability problem is attributed to loss of information of any implicit parameter in likelihood formulation. Several approaches to build a likelihood for estimation of the segregation ratio when only an observed pedigree structure is available are proposed.  相似文献   

16.
Estimation of parameters in a genetic model can be very difficult using likelihood theory when there is no concise functional form for the likelihood function. An alternative method based on fitting the characteristic function is suggested and this method may be used on data with consistent familial composition.  相似文献   

17.
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F(ST)), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.  相似文献   

18.
Summary .   Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines. The mean function can be viewed as a fixed effect and is estimated with a penalty for regularization. With the latent process viewed as another random effect, the model becomes a generalized linear mixed model. In our motivating data set and other applications, the sample size is too large to easily accommodate maximum likelihood or restricted maximum likelihood estimation (REML), so pairwise likelihood, a special case of composite likelihood, is used instead. We develop an asymptotic theory for models that are sufficiently general to be used in a wide variety of applications, including, but not limited to, the problem that motivated this work. The splines have penalty parameters that must converge to zero asymptotically: we derive theory for this along with a data-driven method for selecting the penalty parameter, a method that is shown in simulations to improve greatly upon standard devices, such as likelihood crossvalidation. Finally, we apply the methods to the data from our experiment ACF. We discover an unexpected location for peak formation of ACF.  相似文献   

19.
We consider the statistical modeling and analysis of replicated multi-type point process data with covariates. Such data arise when heterogeneous subjects experience repeated events or failures which may be of several distinct types. The underlying processes are modeled as nonhomogeneous mixed Poisson processes with random (subject) and fixed (covariate) effects. The method of maximum likelihood is used to obtain estimates and standard errors of the failure rate parameters and regression coefficients. Score tests and likelihood ratio statistics are used for covariate selection. A graphical test of goodness of fit of the selected model is based on generalized residuals. Measures for determining the influence of an individual observation on the estimated regression coefficients and on the score test statistic are developed. An application is described to a large ongoing randomized controlled clinical trial for the efficacy of nutritional supplements of selenium for the prevention of two types of skin cancer.  相似文献   

20.
A Forcina 《Biometrics》1992,48(3):743-750
For linear models, assuming a within-experimental-units covariance structure that incorporates errors of measurement, serial correlation, and variation between units, results on explicit estimation of regression parameters are used to simplify maximum likelihood estimation of covariance parameters. The use of an analysis of variance table as a simpler alternative to likelihood inference is illustrated with two examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号