首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as "noise" or "error") within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.  相似文献   

2.
Nonlinear mixed effects models allow investigating individual differences in drug concentration profiles (pharmacokinetics) and responses. Pharmacogenetics focuses on the genetic component of this variability. Two tests often used to detect a gene effect on a pharmacokinetic parameter are (1) the Wald test, assessing whether estimates for the gene effect are significantly different from 0 and (2) the likelihood ratio test comparing models with and without the genetic effect. Because those asymptotic tests show inflated type I error on small sample size and/or with unevenly distributed genotypes, we develop two alternatives and evaluate them by means of a simulation study. First, we assess the performance of the permutation test using the Wald and the likelihood ratio statistics. Second, for the Wald test we propose the use of the F-distribution with four different values for the denominator degrees of freedom. We also explore the influence of the estimation algorithm using both the first-order conditional estimation with interaction linearization-based algorithm and the stochastic approximation expectation maximization algorithm. We apply these methods to the analysis of the pharmacogenetics of indinavir in HIV patients recruited in the COPHAR2-ANRS 111 trial. Results of the simulation study show that the permutation test seems appropriate but at the cost of an additional computational burden. One of the four F-distribution-based approaches provides a correct type I error estimate for the Wald test and should be further investigated.  相似文献   

3.
The thermodynamics of biological interactions is frequently studied by the van't Hoff analysis whereby data on variation of the binding constant K(D) with temperature are used to obtain estimates of standard enthalpy (Delta H degrees ), entropy (Delta S degrees ), and heat capacity (Delta C degrees P) of complex formation. A Monte Carlo simulation demonstrates that the absolute error of the above parameters is proportional to the relative error of KD and independent of the actual values of KD and of the way they vary with temperature. The error of Delta H degrees is approximately the same as that of T Delta S degrees (within 14% in the temperature range 5-45 degrees C). The error depends both on the number of temperature points within the experimental temperature range and on the size of the range, but it is more sensitive to the latter. Using the linear form of the van't Hoff equation to fit data with non-zero Delta C degrees P gives erroneous Delta H degrees and DeltaS degrees estimates at standard temperature except for the case when the T points are placed symmetrically with respect to the standard temperature. With the range of Delta C degrees P values usual for protein-protein interactions, the KD error must be very low to confidently infer that Delta C degrees P is non-zero or to claim that two interactions have different Delta C degrees P.  相似文献   

4.
The data used in studies of bivariate interspecific allometry usually violate the assumption of statistical independence. Although the traits of each species are commonly treated as independent, the expression of a trait among species within a genus may covary because of shared common ancestry. The same effect exists for genera within a family and so on up the phylogenetic hierarchy. Determining sample size by counting data points overestimates the effective sample size, which then leads to overestimating the degrees of freedom that should be used in calculating probabilities and confidence intervals. This results in an inflated Type 1 error rate. Although some workers (e.g., Felsenstein [1985] Am. Nat. 125:1–15) have suggested that this issue may invalidate interspecific allometry as a comparative method, a correction for the problem can be approximated with variance components from a nested analysis of variance. Variance components partition the total variation in the data set among the levels of the nested hierarchy. If the variance component for each nested level is weighted by the number of groups at that level, the sum of these values is an estimate of an effective sample size for the data set which reflects the effects of phylogenetic constraint. Analysis of two data sets, using taxonomy to define levels of the nested hierarchy, suggests that it has been common for published studies of interspecific allometry to severely overestimate the number of degrees of freedom. Interspecific allometry remains an important comparative method for evaluating questions concerning individual species that are not similarly addressed by the format of most of the newer comparative methods. With the correction proposed here for estimating degrees of freedom, the major statistical weakness of the procedure is substantially reduced. © 1994 Wiley-Liss, Inc.  相似文献   

5.
Binding constant data K degrees (T) are commonly subjected to van't Hoff analysis to extract estimates of DeltaH degrees, DeltaS degrees, and DeltaCP degrees for the process in question. When such analyses employ unweighted least-squares fitting of lnK degrees to an appropriate function of the temperature T, they are tacitly assuming constant relative error in K degrees. When this assumption is correct, the statistical errors in DeltaG degrees, DeltaH degrees, DeltaS degrees, DeltaCP degrees, and the T-derivative of DeltaCP degrees (if determined) are all independent of the actual values of K degrees and can be computed from knowledge of just the T values at which K degrees is known and the percent error in K degrees. All of these statistical errors except that for the highest-order constant are functions of T, so they must normally be calculated using a form of the error propagation equation that is not widely known. However, this computation can be bypassed by defining DeltaH degrees as a polynomial in (T-T0), the coefficients of which thus become DeltaH degrees, DeltaCP degrees, and 1/2 dDeltaCP degrees/dT at T=T0. The errors in the key quantities can then be computed by just repeating the fit for different T0. Procedures for doing this are described for a representative data analysis program. Results of such calculations show that expanding the T range from 10-40 to 5-45 degrees C gives significant improvement in the precision of all quantities. DeltaG degrees is typically determined with standard error a factor of approximately 30 smaller than that for DeltaH degrees. Accordingly, the error in TDeltaS degrees is nearly identical to that in DeltaH degrees. For 4% error in K degrees, the T-derivative in DeltaCP degrees cannot be determined unless it is approximately 10 cal mol-1 K-2 or greater; and DeltaCP degrees must be approximately 50 cal mol-1 K-1. Since all errors scale with the data error and inversely with the square root of the number of data points, the present results for 4% error cover any other relative error and number of points, for the same approximate T structure of the data.  相似文献   

6.
A comparison has been made between the estimates obtained from maximum likelihood estimation of gamma, inverse normal, and normal distribution models for stage-frequency data. Results have been compared for six of sets of test data, and from many sets of simulated data. It is concluded that (1) some estimates may differ substantially between the models, (2) estimates from the correct model have little bias, and estimated standard errors are generally close to theoretical values, (3) there are problems in determining degrees of freedom for chi-squared goodness of fit tests, so that it is best to compare test statistics with simulated distributions, and (4) goodness of fit tests may not discriminate well between the three models.  相似文献   

7.
Hydrolytic reactions of oligopeptide 4-nitroanilides catalyzed by human-alpha-thrombin, human activated protein C and human factor Xa were studied at pH 8.0-8.4 and 25.0+/-0.1 degrees C by the progress curve method and individual rate constants were calculated mostly within 10% internal error using DYNAFITV. A systematic strategy has been developed for fitting a three-step consecutive mechanism to eighteen hundred to six thousand time-course data points polled from two to four independent kinetic experiments. Enzyme and substrate concentrations were also calculated. Individual rate constants well reproduce published values obtained under comparable conditions and the Michaelis-Menten kinetic parameters calculated from these elementary rate constants are also within reasonable limits of published values. For comparison, the integrated Michaelis-Menten equation was also fitted to data from twelve sets. Both the k(cat) and k(cat)/K(m) values are within 15% agreement with those calculated using the elementary rate constants obtained with DYNAFITV. Rate constants for the second and third consecutive steps are within 3-4 fold indicating that both determine the overall rate. The Factor Xa-catalyzed hydrolysis of N-alpha-Z-D-Arg-Gly-Arg-pNA.2HCl at pH 8.4 in a series of buffers containing increasing fractions of deuterium at 25.0+/-0.1 degrees C shows a very strong dependence of k(3) and a moderate dependence of k(2) on D content in the buffer: the fractionation factors are: 0.49+/-0.03 for K(1,) 0.70+/-0.05 for k(2), and (0.32+/-0.03)(2) for k(3).  相似文献   

8.
A new molecular dynamics method for calculating free energy profiles for rare events is presented. The new method is based on the creation of an adiabatic separation between the reaction coordinate subspace and the remaining degrees of freedom within a molecular dynamics run. This is achieved by associating with the reaction coordinate(s) a high temperature and large mass, thereby allowing the activated process to occur while permitting the remaining degrees of freedom to respond adiabatically. In this limit, by applying a formal multiple time scale Liouville operator factorization, it can be rigorously shown that the free energy profiles are obtained directly from the probability distribution of the reaction coordinate subspace and, therefore, require no postprocessing of the output data. The new method is applied to a variety of model problems and its performance tested against free energy calculations using the "bluemoon ensemble" approach. The comparison shows that free energy profiles can be calculated with greater ease and efficiency using the new method.  相似文献   

9.
The main functional parameters of blood stored at +4 degrees C in ACD, according to the common transfusional practice, have been carefully followed in the course of 40 days. The expected depletion of DPG takes place within 10 days, but apparently, no increase of the Hb affinity towards oxygen is observed in this period (or later), because pH lowering acts in the opposite direction during the same time. However, the intrinsic increased affinity of Hb is promptly revealed if the "actual" pHs are corrected at the standard value of 7.4, and/or are extrapolated at this pH from Bohr effect.  相似文献   

10.
By secreting granulocyte/macrophage colonystimulating factor (GM-CSF), metastatic Lewis lung carcinoma (LLC-LN7) tumors induce the appearance of myelopoiesis-associated immune-suppressor cells that resemble granulocytic-macrophage (GM) progenitor cells. The presence of these GM-suppressor cells in mice bearing LLC-LN7 tumors was associated with a reduced capacity of splenic T cells to proliferate in response to interleukin-2 (IL-2). Administration of low doses of 100 U interferon (IFN) plus 10 U tumor necrosis factor (TNF) to the tumor bearers, a combination treatment that we previously showed to diminish the presence of GM-suppressor cells synergistically, restored proliferative responsiveness of the splenic T cells to IL-2. These LLC-LN7-bearing mice were also examined for whether cells that phenotypically resemble GM-progenitor cells (ER-MP12+ cells) infiltrate the tumor mass. ER-MP12+ cells composed approximately 10% of the cells isolated from dissociated tumors of mice that had been treated with placebo or with either IFN or TNF alone, but IFN/TNF therapy markedly reduced the number of tumor-infiltrating ER-MP12+ suppressor cells. The IFN/TNF treatment to eliminate GM-suppressor cells and restore T cell responsiveness to IL-2 was next coupled with low dose IL-2 therapy (100 U twice daily). Addition of IL-2 to the treatment regimen did not significantly influence the effectiveness of the IFN/TNF treatment in eliminating GM-suppressor cells from the LLC-LN7 tumor mass. However, inclusion of IL-2 with the IFN/TNF treatment regimen enhanced the CD8+, but not the CD4+, cell content within the tumor, and diminished the number of metastatic lung nodules within the mice. When these tumors were excised, dissociated, and bulk-cultured with a low dose of IL-2, an increased level of cytotoxic T lymphocyte (CTL) activity was generated in the TIL cultures from mice that had received IFN/TNF plus IL-2 treatments. A lesser but detectable level of CTL activity was generated in TIL cultures from mice that were treated with only IFN/TNF, while no CTL activity was generated in tumor cultures from mice receiving only placebo or low-dose IL-2. These results suggest the effectiveness of IFN plus TNF therapy in restoring IL-2 responsiveness in mice bearing GM-suppressor cell-inducing tumors and at enhancing both the intratumoral CD8+ cell content and the generation of CTL activity in bulk cultures of these tumors.This study was supported by the Medical Research Service of the Department of Veterans Affairs, by grants CA-45080 and CA-48080 from the National Institutes of Health, and by the American Cancer Society, Illinois  相似文献   

11.
Abstract Isolates of R. leguminosarum bv. viciae from pea and lentil nodules taken at one field site in France were tested in the laboratory for their ability to donate and receive plasmids by conjugation. Five isolates of 20 tested as donors were found to be capable of donating a plasmid which restored the ability to nodulate V. sativa to an isolate which had spontaneously lost this ability. Of 16 isolates tested as recipients all were found to be competent to receive one or more Tn5-labelled test plasmids at a frequency that varied widely (10−9− 10−3 per recipient) dependent upon both the recipient and the plasmid transferred. Three distinct plasmids carrying genes essential for symbiotic functions (pSym) were consistently shown to be transferred at a lower frequency than a cryptic plasmid. Collectively, these results indicate a significant potential for plasmid transfer within the natural soil population. During this work, several independent derivatives were obtained which contained two bv. viciae pSym. These plasmids usually appeared to be compatible together in cells ex planta, but the one acquired in matings was apparently frequently lost (10−2 per cell) in nodules of V. sativa . Hybrid derivatives containing bv. viciae and bv. phaseoli pSym, apparently retained both plasmids in nodules when P. vulgaris was the host plant but lost the bv. phaseoli pSym at high frequency (4 × 10−1 per cell) in nodules of V. sativa . Structural rearrangements among the plasmids of these transconjugants were also detected in cells recovered from nodules.  相似文献   

12.

Background

The distribution of residual effects in linear mixed models in animal breeding applications is typically assumed normal, which makes inferences vulnerable to outlier observations. In order to mute the impact of outliers, one option is to fit models with residuals having a heavy-tailed distribution. Here, a Student''s-t model was considered for the distribution of the residuals with the degrees of freedom treated as unknown. Bayesian inference was used to investigate a bivariate Student''s-t (BSt) model using Markov chain Monte Carlo methods in a simulation study and analysing field data for gestation length and birth weight permitted to study the practical implications of fitting heavy-tailed distributions for residuals in linear mixed models.

Methods

In the simulation study, bivariate residuals were generated using Student''s-t distribution with 4 or 12 degrees of freedom, or a normal distribution. Sire models with bivariate Student''s-t or normal residuals were fitted to each simulated dataset using a hierarchical Bayesian approach. For the field data, consisting of gestation length and birth weight records on 7,883 Italian Piemontese cattle, a sire-maternal grandsire model including fixed effects of sex-age of dam and uncorrelated random herd-year-season effects were fitted using a hierarchical Bayesian approach. Residuals were defined to follow bivariate normal or Student''s-t distributions with unknown degrees of freedom.

Results

Posterior mean estimates of degrees of freedom parameters seemed to be accurate and unbiased in the simulation study. Estimates of sire and herd variances were similar, if not identical, across fitted models. In the field data, there was strong support based on predictive log-likelihood values for the Student''s-t error model. Most of the posterior density for degrees of freedom was below 4. Posterior means of direct and maternal heritabilities for birth weight were smaller in the Student''s-t model than those in the normal model. Re-rankings of sires were observed between heavy-tailed and normal models.

Conclusions

Reliable estimates of degrees of freedom were obtained in all simulated heavy-tailed and normal datasets. The predictive log-likelihood was able to distinguish the correct model among the models fitted to heavy-tailed datasets. There was no disadvantage of fitting a heavy-tailed model when the true model was normal. Predictive log-likelihood values indicated that heavy-tailed models with low degrees of freedom values fitted gestation length and birth weight data better than a model with normally distributed residuals.Heavy-tailed and normal models resulted in different estimates of direct and maternal heritabilities, and different sire rankings. Heavy-tailed models may be more appropriate for reliable estimation of genetic parameters from field data.  相似文献   

13.
Reiter  Jerome P. 《Biometrika》2007,94(2):502-508
When performing multi-component significance tests with multiply-imputeddatasets, analysts can use a Wald-like test statistic and areference F-distribution. The currently employed degrees offreedom in the denominator of this F-distribution are derivedassuming an infinite sample size. For modest complete-data samplesizes, this degrees of freedom can be unrealistic; for example,it may exceed the complete-data degrees of freedom. This paperpresents an alternative denominator degrees of freedom thatis always less than or equal to the complete-data denominatordegrees of freedom, and equals the currently employed denominatordegrees of freedom for infinite sample sizes. Its advantagesover the currently employed degrees of freedom are illustratedwith a simulation.  相似文献   

14.
15.
Strong, integrin-mediated adhesion of neutrophils to endothelium during inflammation is a dynamic process, requiring a conformational change in the integrin molecule to increase its affinity for its endothelial counterreceptors. To avoid general activation of the cell, Mg(2+) was used to induce the high-affinity integrin conformation, and micromechanical methods were used to determine adhesion probability to beads coated with the endothelial ligand ICAM-1. Neutrophils in Mg(2+) bind to the beads with much greater frequency and strength than in the presence of Ca(2+). An increase in adhesion strength and frequency was observed with both increasing temperature and contact duration (from 2 s to 1 min, 21 or 37 degrees C). The dependence of adhesion probability on contact time or receptor density yielded estimates of the effective reverse rate constant, k(r), and the equilibrium association constant, K(a), for binding of neutrophils to ICAM-1 coated surfaces in Mg(2+): k(r) approximately 0.7 s(-1) and the product K(a)rho(c) approximately 2.4 x 10(-4), where rho(c) is the density of integrin on the cell surface.  相似文献   

16.
Analysis of variance for gene expression microarray data.   总被引:22,自引:0,他引:22  
Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for large-scale analysis of gene expression. Microarrays can be used to measure the relative quantities of specific mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technology has been recognized, many open questions remain about appropriate analysis of microarray data. One question is how to make valid estimates of the relative expression for genes that are not biased by ancillary sources of variation. Recognizing that there is inherent "noise" in microarray data, how does one estimate the error variation associated with an estimated change in expression, i.e., how does one construct the error bars? We demonstrate that ANOVA methods can be used to normalize microarray data and provide estimates of changes in gene expression that are corrected for potential confounding effects. This approach establishes a framework for the general analysis and interpretation of microarray data.  相似文献   

17.
Ten types of mariner transposable elements (232 individual sequences) are present in the completed genomic DNA sequence of Caenorhabditis elegans and the partial sequence of Caenorhabditis briggsae. We analyze these replicated instances of mariner evolution and find that elements of a type have evolved within their genomes under no selection on their transposase genes. Seven of the ten reconstructed ancestral mariners carry defective transposase genes. Selection has acted during the divergence of some ancestral elements. The neutrally-evolving mariners are used to analyze the pattern of molecular evolution in Caenorhabditis. There is a significant mutational bias against transversions and significant variation in rates of change across sites. Deletions accumulate at a rate of 0.034 events/bp per substitution/site, with an average size of 166 bp (173 gaps observed). Deletions appear to obliterate preexisting deletions over time, creating larger gaps. Insertions accumulate at a rate of 0.019 events/bp per substitution/site, with an average size of 151 bp (61 events). Although the rate of deletion is lower than most estimates in other species, the large size of deletions causes rapid elimination of neutral DNA: a mariners half-life (the time by which half an elements sequence should have been deleted) is ~0.1 subsitutions/site. This high rate of DNA deletion may explain the compact nature of the nematode genome. When this work was done, both authors were affiliated with the University of Illinois at Urbana-Champaign. Dr. Witherspoon is now working in the private sector, Dr. Robertson remains affiliated with the University of Illinois.  相似文献   

18.
A c-type monohaem, cytochrome c6was isolated from a soluble extract of the green alga Chlorella fusca. The isolated protein shows an apparent molecular mass of 10 kDa by SDS-PAGE, but behaves as a dimer of 20.3 kDa in gel-filtration; the isoelectric point is 3.6. The N-terminal sequence shows high identity with other green algae cytochromes c6. The mid-point redox potential is about +350 mV between pH 5 and 9. The ferric and ferrous forms, and their pH equilibria, have been studied using visible, CD and EPR spectroscopies. The visible spectrum of the reduced cytochrome c6is typical of a c-type haem protein, with maxima at 274 nm, 318 nm (-peak), 416 nm (-peak), 522 nm (-peak), 552–553 nm (-peak). A 690 nm band, characteristic of a haem Met-His axial coordination of the haem group, is present in the oxidized form. At high pH values ( 8), cytochrome c6undergoes an alkaline transition, with a pKa of 8.7. Between pH 3 and 9 the EPR spectrum is dominated by two rhombic species, with g-values at 3.32, 2.05, 1.05 and 2.96, 2.30, 1.43, which interconvert with a pKaof 4. CD spectrum of Chlorella fusca cytochrome c6shows that the proteins must be mainly built up by -helices. Even though there are similarities between Chlorella fusca cytochrome c6and that isolated from Monoraphidium braunii, no cross-reactivity with the antibodies raised against the Chlorella fusca cytochrome has been detected for the protein from Monoraphidium braunii.  相似文献   

19.
Fluorescence change is convenient for monitoring enzyme kinetics. Unfortunately, it loses linearity as the absorbance of the fluorescent substrate increases with concentration. When the sum of absorbance at excitation and emission wavelengths exceeds 0.08, this inner filtering effect (IFE) alters apparent initial velocities, K(m), and k(cat). The IFE distortion of apparent initial velocities can be corrected without doing fluorophore dilution assays. Using the substrate's extinction coefficients at excitation and emission wavelengths, the inner filter effect can be modeled during curve fitting for more accurate Michaelis-Menten parameters. A faster and simpler approach is to derive k(cat) and K(m) from progress curves. Strategies to obtain reliable and reproducible estimates of k(cat) and K(m) from only two or three progress curves are illustrated using matrix metalloproteinase 12 and alkaline phosphatase. Accurate estimates of concentration of enzyme-active sites and specificity constant k(cat)/K(m) (from one progress curve with [S]相似文献   

20.
Geographic variation patterns of biological characters and environmental variables are compared by using a procedure employing multivariate analyses, production of contour maps by the kriging method with enclosed validation of estimates, and Mantel tests to assess the significance of comparisons. As biological material we chose a sample of Dolichopoda cave crickets populations from Central-Southern Italy. The kriging technique provides estimates of the interpolation error for each true and estimated point. This profitable feature offers the opportunity to use, with ascertained levels of confidence, the estimated z -scores for further analysis and to compare data collected within the same area, but not exactly coincident in location or number. In such a way, we were able to use for subsequent comparisons by means of Mantel tests the maximum number of data points for all data sets, which originally differed in sampling sites. The interpretation of the contour maps and their statistical comparison suggested that allozymes and epiphallus shape data sets follow the phylogenetic pathways within the Dolichopoda populations, whereas variation in leg elongation is almost entirely under the control of an environmental gradient, synthetically described by the cave temperature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号