首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Cook RJ  Wei W  Yi GY 《Biometrics》2005,61(3):692-701
We derive semiparametric methods for estimating and testing treatment effects when censored recurrent event data are available over multiple periods. These methods are based on estimating functions motivated by a working "mixed-Poisson" assumption under which conditioning can eliminate subject-specific random effects. Robust pseudoscore test statistics are obtained via "sandwich" variance estimation. The relative efficiency of conditional versus marginal analyses is assessed analytically under a mixed time-homogeneous Poisson model. The robustness and empirical power of the semiparametric approach are assessed through simulation. Adaptations to handle recurrent events arising in crossover trials are described and these methods are applied to data from a two-period crossover trial of patients with bronchial asthma.  相似文献   

2.
Although many algorithms exist for estimating haplotypes from genotype data, none of them take full account of both the decay of linkage disequilibrium (LD) with distance and the order and spacing of genotyped markers. Here, we describe an algorithm that does take these factors into account, using a flexible model for the decay of LD with distance that can handle both "blocklike" and "nonblocklike" patterns of LD. We compare the accuracy of this approach with a range of other available algorithms in three ways: for reconstruction of randomly paired, molecularly determined male X chromosome haplotypes; for reconstruction of haplotypes obtained from trios in an autosomal region; and for estimation of missing genotypes in 50 autosomal genes that have been completely resequenced in 24 African Americans and 23 individuals of European descent. For the autosomal data sets, our new approach clearly outperforms the best available methods, whereas its accuracy in inferring the X chromosome haplotypes is only slightly superior. For estimation of missing genotypes, our method performed slightly better when the two subsamples were combined than when they were analyzed separately, which illustrates its robustness to population stratification. Our method is implemented in the software package PHASE (v2.1.1), available from the Stephens Lab Web site.  相似文献   

3.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

4.
The "neighbor-joining algorithm" is a recursive procedure for reconstructing trees that is based on a transformation of pairwise distances between leaves. We present a generalization of the neighbor-joining transformation, which uses estimates of phylogenetic diversity rather than pairwise distances in the tree. This leads to an improved neighbor-joining algorithm whose total running time is still polynomial in the number of taxa. On simulated data, the method outperforms other distance-based methods. We have implemented neighbor-joining for subtree weights in a program called MJOIN which is freely available under the Gnu Public License at http://bio.math.berkeley.edu/mjoin/.  相似文献   

5.

Background and method

Successfully automated sigmoidal curve fitting is highly challenging when applied to large data sets. In this paper, we describe a robust algorithm for fitting sigmoid dose-response curves by estimating four parameters (floor, window, shift, and slope), together with the detection of outliers. We propose two improvements over current methods for curve fitting. The first one is the detection of outliers which is performed during the initialization step with correspondent adjustments of the derivative and error estimation functions. The second aspect is the enhancement of the weighting quality of data points using mean calculation in Tukey’s biweight function.

Results and conclusion

Automatic curve fitting of 19,236 dose-response experiments shows that our proposed method outperforms the current fitting methods provided by MATLAB®;’s nlinfit nlinfit function and GraphPad’s Prism software.
  相似文献   

6.
In this article we introduce partial retraining, an algorithm to determine the relevance of the input variables of a trained neural network. We place this algorithm in the context of other approaches to relevance determination. Numerical experiments on both artificial and real-world problems show that partial retraining outperforms its competitors, which include methods based on constant substitution, analysis of weight magnitudes, and "optimal brain surgeon".  相似文献   

7.

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.  相似文献   

8.

Background

When unaccounted-for group-level characteristics affect an outcome variable, traditional linear regression is inefficient and can be biased. The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. While each estimator controls for otherwise unaccounted-for effects, the two estimators require different assumptions. Health researchers tend to favor RE estimation, while researchers from some other disciplines tend to favor FE estimation. In addition to RE and FE, an alternative method called within-between (WB) was suggested by Mundlak in 1978, although is utilized infrequently.

Methods

We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios. The scenarios vary in the number of groups, the size of the groups, within-group variation, goodness-of-fit of the model, and the degree to which the model is correctly specified. Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.

Results

Although there are scenarios when each estimator is most appropriate, the cases in which traditional RE estimation is preferred are less common. In finite samples, the WB approach outperforms both traditional estimators. The Hausman test guides the practitioner to the estimator with the smallest absolute error only 61% of the time, and in many sample sizes simply applying the WB approach produces smaller absolute errors than following the suggestion of the test.

Conclusions

Specification and estimation should be carefully considered and ultimately guided by the objective of the analysis and characteristics of the data. The WB approach has been underutilized, particularly for inference on marginal effects in small samples. Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.  相似文献   

9.
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.  相似文献   

10.
By starting from the Johnson distribution pioneered by Johnson ( 1949 ), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model.  相似文献   

11.
We report on a new de novo peptide sequencing algorithm that uses spectral graph partitioning. In this approach, relationships between m/z peaks are represented by attractive and repulsive springs, and the vibrational modes of the spring system are used to infer information about the peaks (such as "likely b-ion" or "likely y-ion"). We demonstrate the effectiveness of this approach by comparison with other de novo sequencers on test sets of ion-trap and QTOF spectra, including spectra of mixtures of peptides. On all datasets, we outperform the other sequencers. Along with spectral graph theory techniques, the new de novo sequencer EigenMS incorporates another improvement of independent interest: robust statistical methods for recalibration of time-of-flight mass measurements. Robust recalibration greatly outperforms simple least-squares recalibration, achieving about three times the accuracy for one QTOF dataset.  相似文献   

12.
In genetic studies the haplotype structure of the regarded population is expected to carry important information. Experimental methods to derive haplotypes, however, are expensive and none of them has yet become standard methodology. On the other hand, maximum likelihood haplotype estimation from unphased individual genotypes may incur inaccuracies. We therefore investigated the relative efficiency of haplotype frequency estimation when nuclear family information is included compared to estimation from experimentally derived haplotypes. Efficiency was measured in terms of variance ratios of the estimates. The variances were derived from the binomial distribution for experimentally derived haplotypes, and from the Fisher information matrix corresponding to the general likelihood function of the haplotype frequency parameters, including family information. We subsequently compared these variance ratios to the variance ratios for the case of estimation from individual genotypes. We found that the information gained from a single child compensates missing phase information to a high degree, resulting in estimates almost as reliable as those derived from observed haplotypes. Thus, if children have already been genotyped for other reasons, it is highly recommendable to include them into the estimation. If child information is not already present, it depends on the number of loci and the haplotype diversity if it is useful to genotype a single child just to reduce phase ambiguity. In general, if the number of loci is less than or equal to three or if the number of haplotypes with a frequency >5% is less than or equal to four, haplotype estimation from individuals is quite good already and the improvement gained from a single child can not compensate the genotyping effort for it. On the other hand, under scenarios with many loci and high haplotype diversity, haplotype frequency estimation from trios can be more efficient than haplotype frequency estimation from individuals also on a per genotype base.  相似文献   

13.
Genome-wide case-control association studies aim at identifying significant differential markers between sick and healthy populations. With the development of large-scale technologies allowing the genotyping of thousands of single nucleotide polymorphisms (SNPs) comes the multiple testing problem and the practical issue of selecting the most probable set of associated markers. Several False Discovery Rate (FDR) estimation methods have been developed and tuned mainly for differential gene expression studies. However they are based on hypotheses and designs that are not necessarily relevant in genetic association studies. In this article we present a universal methodology to estimate the FDR of genome-wide association results. It uses a single global probability value per SNP and is applicable in practice for any study design, using any statistic. We have benchmarked this algorithm on simulated data and shown that it outperforms previous methods in cases requiring non-parametric estimation. We exemplified the usefulness of the method by applying it to the analysis of experimental genotyping data of three Multiple Sclerosis case-control association studies.  相似文献   

14.
Fuglsang A 《Genetics》2006,172(2):1301-1307
In 1990, Frank Wright introduced a method for measuring synonymous codon usage bias in a gene by estimation of the "effective number of codons," N(c). Several attempts have been made recently to improve Wright's estimate of N(c), but the methods that work in cases where a gene encodes a protein not containing all amino acids with degenerate codons have not been tested against each other. In this article I derive five new estimators of N(c) and test them together with the two published estimators, using resampling under rigorous testing conditions. Estimation of codon homozygosity, F, turns out to be a key to the estimation of N(c). F can be estimated in two closely related ways, corresponding to sampling with or without replacement, the latter being what Wright used. The N(c) methods that are based on sampling without replacement showed much better accuracy at short gene lengths than those based on sampling with replacement, indicating that Wright's homozygosity method is superior. Surprisingly, the methods based on sampling with replacement displayed a superior correlation with mRNA levels in Escherichia coli.  相似文献   

15.
Motivation: High-throughput experimental and computational methodsare generating a wealth of protein–protein interactiondata for a variety of organisms. However, data produced by currentstate-of-the-art methods include many false positives, whichcan hinder the analyses needed to derive biological insights.One way to address this problem is to assign confidence scoresthat reflect the reliability and biological significance ofeach interaction. Most previously described scoring methodsuse a set of likely true positives to train a model to scoreall interactions in a dataset. A single positive training set,however, may be biased and not representative of true interactionspace. Results: We demonstrate a method to score protein interactionsby utilizing multiple independent sets of training positivesto reduce the potential bias inherent in using a single trainingset. We used a set of benchmark yeast protein interactions toshow that our approach outperforms other scoring methods. Ourapproach can also score interactions across data types, whichmakes it more widely applicable than many previously proposedmethods. We applied the method to protein interaction data fromboth Drosophila melanogaster and Homo sapiens. Independent evaluationsshow that the resulting confidence scores accurately reflectthe biological significance of the interactions. Contact: rfinley{at}wayne.edu Supplementary information: Supplementary data are availableat Bioinformatics Online. Associate Editor: Burkhard Rost  相似文献   

16.
Cadigan NG 《Biometrics》2006,62(3):713-720
We present local influence diagnostics to measure the sensitivity of a biological limit reference point (LRP) estimated from fitting a model to stock and recruitment data. LRPs are low levels of stock size that the management of commercial fisheries should avoid with high probability. The LRP we examine is the stock size at which recruitment is 50% of the maximum (S(50%)). We derive analytic equations to describe the effects on S(50%) of changing the weight that observations are given in estimation. We derive equations for the Ricker, Beverton-Holt, and hockey-stick stock-recruit models, and four estimation methods including the error sums of squares method on log responses and three quasi-likelihood methods. We conclude from case studies that the hockey-stick model produces the most robust estimates.  相似文献   

17.
Roychoudhury A  Stephens M 《Genetics》2007,176(2):1363-1366
We present a new approach for estimation of the population-scaled mutation rate, , from microsatellite genotype data, using the recently introduced "product of approximate conditionals" framework. Comparisons with other methods on simulated data demonstrate that this new approach is attractive in terms of both accuracy and speed of computation. Our simulation experiments also demonstrate that, despite the theoretical advantages of full-likelihood-based methods, methods based on certain summary statistics (specifically, the sample homozygosity) can perform very competitively in practice.  相似文献   

18.
Maximum likelihood methods for cure rate models with missing covariates   总被引:1,自引:0,他引:1  
Chen MH  Ibrahim JG 《Biometrics》2001,57(1):43-52
We propose maximum likelihood methods for parameter estimation for a novel class of semiparametric survival models with a cure fraction, in which the covariates are allowed to be missing. We allow the covariates to be either categorical or continuous and specify a parametric distribution for the covariates that is written as a sequence of one-dimensional conditional distributions. We propose a novel EM algorithm for maximum likelihood estimation and derive standard errors by using Louis's formula (Louis, 1982, Journal of the Royal Statistical Society, Series B 44, 226-233). Computational techniques using the Monte Carlo EM algorithm are discussed and implemented. A real data set involving a melanoma cancer clinical trial is examined in detail to demonstrate the methodology.  相似文献   

19.
Indirect estimation methodologies of the total fertility rate (TFR) have a long history within demography and have provided important techniques applied demographers can use when data is sparse or lacking. However new methodologies for approximating the total fertility rate have not been proposed in nearly 30 years. This study presents a novel method for indirectly approximating the total fertility rate using an algebraic rearrangement of the general fertility rate (GFR) through the known relationship between GFR and TFR. It then compares the proposed method to the well-known Bogue-Palmore method. These methods are compared in 196 countries and include overall errors as well as characteristics of the countries that contribute to fertility behavior. Additionally, these methods were compared geographically to find any geographical patterns. We find this novel method is not only simpler than the Bogue-Palmore method, requiring fewer data inputs, but also has reduced algebraic and absolute errors when compared with the Bogue-Palmore method and specifically outperforms the Bogue-Palmore method in developing countries. We find that our novel method may be useful estimation procedure for demographers.  相似文献   

20.
We study the use of simultaneous confidence bands for low-dose risk estimation with quantal response data, and derive methods for estimating simultaneous upper confidence limits on predicted extra risk under a multistage model. By inverting the upper bands on extra risk, we obtain simultaneous lower bounds on the benchmark dose (BMD). Monte Carlo evaluations explore characteristics of the simultaneous limits under this setting, and a suite of actual data sets are used to compare existing methods for placing lower limits on the BMD.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号