首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
Majerová et al. (Plant Mol Biol, 2011) have recently reported that a considerable fraction of cytosines at tobacco telomeres is methylated. Although the data presented in this report indicate that tobacco telomeric sequences undergo certain levels of DNA methylation, it is not clear whether the methylated sequences are at telomeres, at internal chromosomal loci or at both.  相似文献   

2.
After the publication of this work [1], we became aware of the fact that the frequency of the ultrasound transmitter that we used for determining the elastic moduli of the trabecular bone specimens was not correctly specified. The oscillation frequency of the ultrasound transmitter was 2 MHz (and not 100 MHz as stated in our work) while we used a sampling rate of 100 MHz. In our publication, the oscillation frequency and sampling rate were confounded. Therefore also the statement in the discussion that we might have determined elastic moduli of trabecular bone tissue rather than the elastic properties of whole specimens because we used an ultrasound frequency > 2 MHz is wrong and has to be omitted.  相似文献   

3.
Due to the growth of interest in single-cell genomics, computational methods for distinguishing true variants from artifacts are highly desirable. While special attention has been paid to false positives in variant or mutation calling from single-cell sequencing data, an equally important but often neglected issue is that of false negatives derived from allele dropout during the amplification of single cell genomes. In this paper, we propose a simple strategy to reduce the false negatives in single-cell sequencing data analysis. Simulation results show that this method is highly reliable, with an error rate of 4.94×10-5, which is orders of magnitude lower than the expected false negative rate (~34%) estimated from a single-cell exome dataset, though the method is limited by the low SNP density in the human genome. We applied this method to analyze the exome data of a few dozen single tumor cells generated in previous studies, and extracted cell specific mutation information for a small set of sites. Interestingly, we found that there are difficulties in using the classical clonal model of tumor cell growth to explain the mutation patterns observed in some tumor cells.  相似文献   

4.

Background

Publication records and citation indices often are used to evaluate academic performance. For this reason, obtaining or computing them accurately is important. This can be difficult, largely due to a lack of complete knowledge of an individual''s publication list and/or lack of time available to manually obtain or construct the publication-citation record. While online publication search engines have somewhat addressed these problems, using raw search results can yield inaccurate estimates of publication-citation records and citation indices.

Methodology

In this paper, we present a new, automated method that produces estimates of an individual''s publication-citation record from an individual''s name and a set of domain-specific vocabulary that may occur in the individual''s publication titles. Because this vocabulary can be harvested directly from a research web page or online (partial) publication list, our method delivers an easy way to obtain estimates of a publication-citation record and the relevant citation indices. Our method works by applying a series of stringent name and content filters to the raw publication search results returned by an online publication search engine. In this paper, our method is run using Google Scholar, but the underlying filters can be easily applied to any existing publication search engine. When compared against a manually constructed data set of individuals and their publication-citation records, our method provides significant improvements over raw search results. The estimated publication-citation records returned by our method have an average sensitivity of and specificity of (in contrast to raw search result specificity of less than 10%). When citation indices are computed using these records, the estimated indices are within of the true value, compared to raw search results which have overestimates of, on average, .

Conclusions

These results confirm that our method provides significantly improved estimates over raw search results, and these can either be used directly for large-scale (departmental or university) analysis or further refined manually to quickly give accurate publication-citation records.  相似文献   

5.

Background  

High-throughput screening (HTS) is a key part of the drug discovery process during which thousands of chemical compounds are screened and their activity levels measured in order to identify potential drug candidates (i.e., hits). Many technical, procedural or environmental factors can cause systematic measurement error or inequalities in the conditions in which the measurements are taken. Such systematic error has the potential to critically affect the hit selection process. Several error correction methods and software have been developed to address this issue in the context of experimental HTS [17]. Despite their power to reduce the impact of systematic error when applied to error perturbed datasets, those methods also have one disadvantage - they introduce a bias when applied to data not containing any systematic error [6]. Hence, we need first to assess the presence of systematic error in a given HTS assay and then carry out systematic error correction method if and only if the presence of systematic error has been confirmed by statistical tests.  相似文献   

6.
More and more noninvasive genetic data are being produced but a general methodology to quantify genotyping error rates from non-pilot data remains lacking. Here we propose a mathematical approach to estimate genotyping error rates by exploring the relationship between errors and PCR replicates. This method can be used to quantify the error rates for either the multi-tubes approach designed by Taberlet et al. (Nucleic Acids Res 24: 3189–3194, 1996) or the pilot method by Prugh et al. (Mol Ecol 14: 1585–1596, 2005).  相似文献   

7.
We use a technique from engineering (Xia and Moog, in IEEE Trans. Autom. Contr. 48(2):330–336, 2003; Jeffrey and Xia, in Tan, W.Y., Wu, H. (Eds.), Deterministic and Stochastic Models of AIDS Epidemics and HIV Infections with Intervention, 2005) to investigate the algebraic identifiability of a popular three-dimensional HIV/AIDS dynamic model containing six unknown parameters. We find that not all six parameters in the model can be identified if only the viral load is measured, instead only four parameters and the product of two parameters (N and λ) are identifiable. We introduce the concepts of an identification function and an identification equation and propose the multiple time point (MTP) method to form the identification function which is an alternative to the previously developed higher-order derivative (HOD) method (Xia and Moog, in IEEE Trans. Autom. Contr. 48(2):330–336, 2003; Jeffrey and Xia, in Tan, W.Y., Wu, H. (Eds.), Deterministic and Stochastic Models of AIDS Epidemics and HIV Infections with Intervention, 2005). We show that the newly proposed MTP method has advantages over the HOD method in the practical implementation. We also discuss the effect of the initial values of state variables on the identifiability of unknown parameters. We conclude that the initial values of output (observable) variables are part of the data that can be used to estimate the unknown parameters, but the identifiability of unknown parameters is not affected by these initial values if the exact initial values are measured with error. These noisy initial values only increase the estimation error of the unknown parameters. However, having the initial values of the latent (unobservable) state variables exactly known may help to identify more parameters. In order to validate the identifiability results, simulation studies are performed to estimate the unknown parameters and initial values from simulated noisy data. We also apply the proposed methods to a clinical data set to estimate HIV dynamic parameters. Although we have developed the identifiability methods based on an HIV dynamic model, the proposed methodologies are generally applicable to any ordinary differential equation systems.  相似文献   

8.
In the Gulf of Mexico (GOM), fish biomass estimates are necessary for the evaluation of habitat use and function following the mandate for ecosystem-based fisheries management in the recently reauthorized Sustainable Fisheries Act of 2007. Acoustic surveys have emerged as a potential tool to estimate fish biomass in shallow-water estuaries, however, the transformation of acoustic data into an index of fish biomass is not straightforward. In this article, we examine the consequences of equation selection for target strength (TS) to fish length relationships on potential error generation in hydroacoustic fish biomass estimates. We applied structural equation models (SEMs) to evaluate how our choice of an acoustic TS–fish length equation affected our biomass estimates, and how error occurred and propagated during this process. To demonstrate the magnitude of the error when applied to field data, we used SEMs on normally distributed simulated data to better understand the sources of error involved with converting acoustic data to fish biomass. As such, we describe where, and to what magnitude, error propagates when estimating fish biomass. Estimates of fish lengths were affected by measurement errors of TS, and from inexact relationships between fish length and TS. Differences in parameter estimates resulted in significant differences in fish biomass estimates and led to the conclusion that in the absence of known TS–fish length relationships, Love’s (J Acoust Soc Am 46:746–752, 1969) lateral-aspect equation may be an acceptable substitute for an ecosystem-specific TS–fish length relationship. Based upon SEMs applied to simulated data, perhaps the most important, yet most variable, component is the mean volume backscattering strength, which significantly inflated biomass errors in approximately 10% of the cases. Handling editor: M. Power  相似文献   

9.
There has been remarkably little attention to using the high resolution provided by genotyping‐by‐sequencing (i.e., RADseq and similar methods) for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of data that could lead to downward‐biased, yet precise, estimates of relatedness. Here, we assess the applicability of genotyping‐by‐sequencing for relatedness inferences given its relatively high genotyping error rate. Individuals of known relatedness were simulated under genotyping error, allelic dropout and missing data scenarios based on an empirical ddRAD data set, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (Genetics Research, 67, 175) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP data set with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite data set of tens of markers. The simulation‐based approach used here can be easily implemented by others on their own genotyping‐by‐sequencing data sets to confirm the most appropriate and powerful estimator for their data.  相似文献   

10.
Post DM 《Oecologia》2007,153(4):973-984
Understanding and explaining the causes of variation in food-chain length is a fundamental challenge for community ecology. The productive-space hypothesis, which suggests food-chain length is determined by the combination of local resource availability and ecosystem size, is central to this challenge. Two different approaches currently exist for testing the productive-space hypothesis: (1) the dual gradient approach that tests for significant relationships between food-chain length and separate gradients of ecosystem size (e.g., lake volume) and per-unit-size resource availability (e.g., g C m−1 year−2), and (2) the single gradient approach that tests for a significant relationship between food-chain length and the productive space (product of ecosystem size and per-unit-size resource availability). Here I evaluate the efficacy of the two approaches for testing the productive-space hypothesis. Using simulated data sets, I estimate the Type 1 and Type 2 error rates for single and dual gradient models in recovering a known relationship between food-chain length and ecosystem size, resource availability, or the combination of ecosystem size and resource ability, as specified by the productive-space hypothesis. The single gradient model provided high power (low Type 2 error rates) but had a very high Type 1 error rate, often erroneously supporting the productive-space hypothesis. The dual gradient model had a very low Type 1 error rate but suffered from low power to detect an effect of per-unit-size resource availability because the range of variation in resource availability is limited. Finally, I performed a retrospective power analysis for the Post et al. (Nature 405:1047–1049, 2000) data set, which tested and rejected the productive-space hypothesis using the dual gradient approach. I found that Post et al. (Nature 405:1047–1049, 2000) had sufficient power to reject the productive-space hypothesis in north temperate lakes; however, the productive-space hypothesis must be tested in other ecosystems before its generality can be fully addressed.  相似文献   

11.
Growth competition assays have been developed to quantify the relative fitness of HIV-1 mutants. In this article, we develop mathematical models to describe viral/cellular dynamic interactions in the assay system from which the competitive fitness indices or parameters are defined. In our previous HIV-viral fitness experiments, the concentration of uninfected target cells was assumed to be constant (Wu et al. 2006). But this may not be true in some experiments. In addition, dual infection may frequently occur in viral fitness experiments and may not be ignorable. Here, we relax these two assumptions and extend our earlier viral fitness model (Wu et al. 2006). The resulting models then become nonlinear ODE systems for which closed-form solutions are not achievable. In the new model, the viral relative fitness is a function of time since it depends on the target cell concentration. First, we studied the structure identifiability of the nonlinear ODE models. The identifiability analysis showed that all parameters in the proposed models are identifiable from the flow-cytometry-based experimental data that we collected. We then employed a global optimization approach (the differential evolution algorithm) to directly estimate the kinetic parameters as well as the relative fitness index in the nonlinear ODE models using nonlinear least square regression based on the experimental data. Practical identifiability was investigated via Monte Carlo simulations.  相似文献   

12.

Background  

For gene expression data obtained from a time-course microarray experiment, Liu et al. [1] developed a new algorithm for clustering genes with similar expression profiles over time. Performance of their proposal was compared with three other methods including the order-restricted inference based methodology of Peddada et al. [2, 3]. In this note we point out several inaccuracies in Liu et al. [1] and conclude that the order-restricted inference based methodology of Peddada et al. (programmed in the software ORIOGEN) indeed operates at the desired nominal Type 1 error level, an important feature of a statistical decision rule, while being computationally substantially faster than indicated by Liu et al. [1].  相似文献   

13.
Schoolnik GK  Yildiz FH 《Genome biology》2000,1(3):reviews101-3
Vibrio cholerae O1 has figured prominently in the history of infectious diseases as a cause of periodic global epidemics, an affliction of refugees in areas of social strife and as the disease first subjected to modern epidemiological analysis during the classic investigations of John Snow in mid-19th century London [1]. Thus, publication of the entire genome sequence of V. cholerae O1 (biotype El Tor) in Nature [2] by a consortium of investigators from The Institute for Genomic Research, the University of Maryland and Harvard Medical School is properly regarded as an historic event that will trigger a paradigm shift in the study of this organism.  相似文献   

14.
More than 25 years have passed since publication of the first comprehensive multi-authored landmark volume on the population biology and evolution of clonal organisms (Jackson et al. 1985). Since then, no less than eight symposium volumes or special issues have appeared in scientific journals reporting on advances in the field of clonal plant research, indicating that the study of clonal organisms has remained an important topic in ecological research. The three most recent overviews were published in special issues of this journal (Stuefer et al. 2000; Tolvanen et al. 2004; Sammul et al. 2008), and these are now supplemented with a fourth special issue of Evolutionary Ecology. The articles published here reflect some of the most important contributions to a workshop on clonal plant biology held in Leuven (Belgium) in July 2009 and they illustrate some major advances that have been made over the last few years. In the following paragraphs, we first summarize some representative contributions to the current issue, and second, we put forward some personal ideas about promising and underexplored research lines in clonal plant research.  相似文献   

15.

Background  

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golubet al.[1] and the NCI60 dataset of Rosset al.[2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.  相似文献   

16.
17.
Biological assays often suffer from large systematic variation between sets of experiments. This variation is sometimes countered by normalizing the results of an "exposed" (E) experiment to that of a simultaneously performed "control" (C). We demonstrate that the arithmetic mean of such ratios overestimates the "true" E/C ratio. Fortunately, the overestimation may be calculated from experimentally accessible information, and it is generally possible to correct for this factor using formulas presented in this paper. We have studied the impact of this effect on a set of studies in the bioelectromagnetics literature and find that, although most results are weakened by the correction, few are significantly altered. Some of the papers used for our literature study are controversial; we believe that the present study may strengthen the quoted results by removing doubts about the statistical treatment of E/C ratios. Both false positives and negatives are possible if the proper correction is not made to the arithmetic mean of a set of E/C data. Realistic examples of erroneous statistical conclusions demonstrate that this is a real concern for E/C data which are marginal in both magnitude (mean < 2) and variance (standard deviation > 0.5).  相似文献   

18.
Data analysis and interpretation remain major logistical challenges when attempting to identify large numbers of protein phosphorylation sites by nanoscale reverse-phase liquid chromatography/tandem mass spectrometry (LC-MS/MS) (Supplementary Figure 1 online). In this report we address challenges that are often only addressable by laborious manual validation, including data set error, data set sensitivity and phosphorylation site localization. We provide a large-scale phosphorylation data set with a measured error rate as determined by the target-decoy approach, we demonstrate an approach to maximize data set sensitivity by efficiently distracting incorrect peptide spectral matches (PSMs), and we present a probability-based score, the Ascore, that measures the probability of correct phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra. We applied our methods in a fully automated fashion to nocodazole-arrested HeLa cell lysate where we identified 1,761 nonredundant phosphorylation sites from 491 proteins with a peptide false-positive rate of 1.3%.  相似文献   

19.
We propose a general working strategy to deal with incomplete reference libraries in the DNA barcoding identification of species. Considering that (1) queries with a large genetic distance with their best DNA barcode match are more likely to be misidentified and (2) imposing a distance threshold profitably reduces identification errors, we modelled relationships between identification performances and distance thresholds in four DNA barcode libraries of Diptera (n = 4270), Lepidoptera (n = 7577), Hymenoptera (n = 2067) and Tephritidae (n = 602 DNA barcodes). In all cases, more restrictive distance thresholds produced a gradual increase in the proportion of true negatives, a gradual decrease of false positives and more abrupt variations in the proportions of true positives and false negatives. More restrictive distance thresholds improved precision, yet negatively affected accuracy due to the higher proportions of queries discarded (viz. having a distance query-best match above the threshold). Using a simple linear regression we calculated an ad hoc distance threshold for the tephritid library producing an estimated relative identification error <0.05. According to the expectations, when we used this threshold for the identification of 188 independently collected tephritids, less than 5% of queries with a distance query-best match below the threshold were misidentified. Ad hoc thresholds can be calculated for each particular reference library of DNA barcodes and should be used as cut-off mark defining whether we can proceed identifying the query with a known estimated error probability (e.g. 5%) or whether we should discard the query and consider alternative/complementary identification methods.  相似文献   

20.
The objective of this study was to estimate: (i) the sensitivity of cytologists in recognizing abnormal smears; (ii) the sensitivity of cervical cytology as a method of detecting abnormal smears among those obtained in the presence of cervical intraepithelial neoplasia (CIN). Study subjects were 61 women with a histologically confirmed CIN identified through colpohistological and cytologic screening. For objective (i) new smears were taken from study subjects just before treatment, mixed with routine preparations, interpreted by unaware cytologists and then blindly reviewed by a group of three expert supervisors, who reached a consensus diagnosis. Cytologists classified as positive for squamous intraepithelial lesion (SIL) 30 of the 34 smears judged as positive by supervisors (100% of smears classified as high-grade and 67% of smears classified as low-grade SIL by the supervisors). Our approach, based on creating a set of smears with a high a priori probability of being positive, proved to be an efficient way of estimating errors of interpretation. For objective (ii), smears taken at the moment of diagnosis, just before biopsy, were also reviewed by the same supervisors. These CIN cases were identified among asymptomatic women independently of cytological findings and results are therefore not subject to verification bias. Among the 33 histological CINII/III, four (12%) smears had no atypical cells (three negatives and one unsatisfactory) at review. The same proportion was 26% (four negatives and one unsatisfactory) among the 19 histological CINI. No significant differences in smear content were found between the seven ‘false negatives’ and a sample of ‘true positives’ and ‘true negatives’ for a number of formal adequacy criteria (including presence of endocervical cells). Strong differences were found between positive smears taken just before biopsy and those taken just before treatment (in 11 women the first smear only was positive, while the opposite was never observed), suggesting an effect of punch biopsy in removing lesions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号