首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Error-tolerant pooling designs with inhibitors.   总被引:2,自引:0,他引:2  
Pooling designs are used in clone library screening to efficiently distinguish positive clones from negative clones. Mathematically, a pooling design is just a nonadaptive group testing scheme which has been extensively studied in the literature. In some applications, there is a third category of clones called "inhibitors" whose effect is to neutralize positives. Specifically, the presence of an inhibitor in a pool dictates a negative outcome even though positives are present. Sequential group testing schemes, which can be modified to three-stage schemes, have been proposed for the inhibitor model, but it is unknown whether a pooling design (a one-stage scheme) exists. Another open question raised in the literature is whether the inhibitor model can treat unreliable pool outcomes. In this paper, we answer both open problems by giving a pooling design, as well as a two-stage scheme, for the inhibitor model with unreliable outcomes. The number of pools required by our schemes are quite comparable to the three-stage scheme.  相似文献   

2.
The study of gene functions requires high-quality DNA libraries. However, a large number of tests and screenings are necessary for compiling such libraries. We describe an algorithm for extracting as much information as possible from pooling experiments for library screening. Collections of clones are called pools, and a pooling experiment is a group test for detecting all positive clones. The probability of positiveness for each clone is estimated according to the outcomes of the pooling experiments. Clones with high chance of positiveness are subjected to confirmatory testing. In this paper, we introduce a new positive clone detecting algorithm, called the Bayesian network pool result decoder (BNPD). The performance of BNPD is compared, by simulation, with that of the Markov chain pool result decoder (MCPD) proposed by Knill et al. in 1996. Moreover, the combinatorial properties of pooling designs suitable for the proposed algorithm are discussed in conjunction with combinatorial designs and dhbox{-}{rm disjunct} matrices. We also show the advantage of utilizing packing designs or BIB designs for the BNPD algorithm.  相似文献   

3.
We consider nonadaptive pooling designs for unique-sequence screening of a 1530-clone map ofAspergillus nidulans.The map has the properties that the clones are, with possibly a few exceptions, ordered and no more than 2 of them cover any point on the genome. We propose two subdesigns of the Steiner systemS(3, 5, 65), one with 65 pools and approximately 118 clones per pool, the other with 54 pools and about 142 clones per pool. Each design allows 1 or 2 positive clones to be detected, even in the presence of substantial experimental error rates. More efficient designs are possible if the overlap information in the map is exploited, if there is no constraint on the number of clones in a pool, and if no error tolerance is required. An information theory lower bound requires at least 12 pools to satisfy these minimal criteria, and an “interleaved binary” design can be constructed on 20 pools, with about 380 clones per pool. However, the designs with more pools have important properties of robustness to various possible errors and general applicability to a wider class of pooling experiments.  相似文献   

4.
INTRODUCTION: Microarray experiments often have complex designs that include sample pooling, biological and technical replication, sample pairing and dye-swapping. This article demonstrates how statistical modelling can illuminate issues in the design and analysis of microarray experiments, and this information can then be used to plan effective studies. METHODS: A very detailed statistical model for microarray data is introduced, to show the possible sources of variation that are present in even the simplest microarray experiments. Based on this model, the efficacy of common experimental designs, normalisation methodologies and analyses is determined. RESULTS: When the cost of the arrays is high compared with the cost of samples, sample pooling and spot replication are shown to be efficient variance reduction methods, whereas technical replication of whole arrays is demonstrated to be very inefficient. Dye-swap designs can use biological replicates rather than technical replicates to improve efficiency and simplify analysis. When the cost of samples is high and technical variation is a major portion of the error, technical replication can be cost effective. Normalisation by centreing on a small number of spots may reduce array effects, but can introduce considerable variation in the results. Centreing using the bulk of spots on the array is less variable. Similarly, normalisation methods based on regression methods can introduce variability. Except for normalisation methods based on spiking controls, all normalisation requires that most genes do not differentially express. Methods based on spatial location and/or intensity also require that the nondifferentially expressing genes are at random with respect to location and intensity. Spotting designs should be carefully done so that spot replicates are widely spaced on the array, and genes with similar expression patterns are not clustered. DISCUSSION: The tools for statistical design of experiments can be applied to microarray experiments to improve both efficiency and validity of the studies. Given the high cost of microarray experiments, the benefits of statistical input prior to running the experiment cannot be over-emphasised.  相似文献   

5.
Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project.  相似文献   

6.
Zhao Y  Wang S 《Human heredity》2009,67(1):46-56
Study cost remains the major limiting factor for genome-wide association studies due to the necessity of genotyping a large number of SNPs for a large number of subjects. Both DNA pooling strategies and two-stage designs have been proposed to reduce genotyping costs. In this study, we propose a cost-effective, two-stage approach with a DNA pooling strategy. During stage I, all markers are evaluated on a subset of individuals using DNA pooling. The most promising set of markers is then evaluated with individual genotyping for all individuals during stage II. The goal is to determine the optimal parameters (pi(p)(sample ), the proportion of samples used during stage I with DNA pooling; and pi(p)(marker ), the proportion of markers evaluated during stage II with individual genotyping) that minimize the cost of a two-stage DNA pooling design while maintaining a desired overall significance level and achieving a level of power similar to that of a one-stage individual genotyping design. We considered the effects of three factors on optimal two-stage DNA pooling designs. Our results suggest that, under most scenarios considered, the optimal two-stage DNA pooling design may be much more cost-effective than the optimal two-stage individual genotyping design, which use individual genotyping during both stages.  相似文献   

7.
The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.  相似文献   

8.
The study of gene functions requires a DNA library of high quality, such a library is obtained from a large mount of testing and screening. Pooling design is a very helpful tool for reducing the number of tests for DNA library screening. In this paper, we present new one- and two-stage pooling designs, together with new probabilistic pooling designs. The approach in this paper works for both error-free and error-tolerance scenarios.  相似文献   

9.
Pilot studies are often used to help design ecological studies. Ideally the pilot data are incorporated into the full-scale study data, but if the pilot study's results indicate a need for major changes to experimental design, then pooling pilot and full-scale study data is difficult. The default position is to disregard the preliminary data. But ignoring pilot study data after a more comprehensive study has been completed forgoes statistical power or costs more by sampling additional data equivalent to the pilot study's sample size. With Bayesian methods, pilot study data can be used as an informative prior for a model built from the full-scale study dataset. We demonstrate a Bayesian method for recovering information from otherwise unusable pilot study data with a case study on eucalypt seedling mortality. A pilot study of eucalypt tree seedling mortality was conducted in southeastern Australia in 2005. A larger study with a modified design was conducted the following year. The two datasets differed substantially, so they could not easily be combined. Posterior estimates from pilot dataset model parameters were used to inform a model for the second larger dataset. Model checking indicated that incorporating prior information maintained the predictive capacity of the model with respect to the training data. Importantly, adding prior information improved model accuracy in predicting a validation dataset. Adding prior information increased the precision and the effective sample size for estimating the average mortality rate. We recommend that practitioners move away from the default position of discarding pilot study data when they are incompatible with the form of their full-scale studies. More generally, we recommend that ecologists should use informative priors more frequently to reap the benefits of the additional data.  相似文献   

10.
Selective DNA pooling is an advanced methodology for linkage mapping of quantitative trait loci (QTL) in farm animals. The principle is based on densitometric estimates of marker allele frequency in pooled DNA samples of phenotypically extreme individuals from half-sib, backcross and F(2) experimental designs in farm animals. This methodology provides a rapid and efficient analysis of a large number of individuals with short tandem repeat markers that are essential to detect QTL through the genome - wide searching approach. Several strategies involving whole genome scanning with a high statistical power have been developed for systematic search to detect the quantitative traits loci and linked loci of complex traits. In recent studies, greater success has been achieved in mapping several QTLs in Israel-Holstein cattle using selective DNA pooling. This paper outlines the currently emerged novel strategies of linkage mapping to identify QTL based on selective DNA pooling with more emphasis on its theoretical pre-requisite to detect linked QTLs, applications, a general theory for experimental half-sib designs, the power of statistics and its feasibility to identify genetic markers linked QTL in dairy cattle. The study reveals that the application of selective DNA pooling in dairy cattle can be best exploited in the genome-wide detection of linked loci with small and large QTL effects and applied to a moderately sized half-sib family of about 500 animals.  相似文献   

11.

Key message

The paper shows that unreplicated designs in multi-environmental trials are most efficient. If replication per environment is needed then augmented p-rep designs outperform augmented and replicated designs in triticale and maize.

Abstract

In plant breeding, augmented designs with unreplicated entries are frequently used for early generation testing. With limited amount of seed, this design allows to use a maximum number of environments in multi-environmental trials (METs). Check plots enable the estimation of block effects, error variances and a connection of otherwise unconnected trials in METs. Cullis et al. (J Agri Biol Environ Stat 11:381–393, 2006) propose to replace check plots from a grid-plot design by plots of replicated entries leading to partially replicated (p-rep) designs. Williams et al. (Biom J 53:19–27, 2011) apply this idea to augmented designs (augmented p-rep designs). While p-rep designs are increasingly used in METs, a comparison of the efficiency of augmented p-rep designs and augmented designs in the range between replicated and unreplicated designs in METs is lacking. We simulated genetic effects and allocated them according to these four designs to plot yields of a triticale and a maize uniformity trial. The designs varied in the number of environments, but have a fixed number of entries and total plots. The error model and the assumption of fixed or random entry effects were varied in simulations. We extended our simulation for the triticale data by including correlated entry effects which are common in genomic selection. Results show an advantage of unreplicated and augmented p-rep designs and a preference for using random entry effects, especially in case of correlated effects reflecting relationships among entries. Spatial error models had minor advantages compared to purely randomization-based models.  相似文献   

12.
Detecting genetic markers with biologically relevant effects remains a challenge due to multiple testing. Standard analysis methods focus on evidence against the null and protect primarily the type I error. On the other hand, the worthwhile alternative is specified for power calculations at the design stage. The balanced test as proposed by Moerkerke and others (2006) and Moerkerke and Goetghebeur (2006) incorporates this alternative directly in the decision criterion to achieve better power. Genetic markers are selected and ranked in order of the balance of evidence they contain against the null and the target alternative. In this paper, we build on this guiding principle to develop 2-stage designs for screening genetic markers when the cost of measurements is high. For a given marker, a first sample may already provide sufficient evidence for or against the alternative. If not, more data are gathered at the second stage which is then followed by a binary decision based on all available data. By optimizing parameters which determine the decision process over the 2 stages (such as the area of the "gray" zone which leads to the gathering of extra data), the expected cost per marker can be reduced substantially. We also demonstrate that, compared to 1-stage designs, 2-stage designs achieve a better balance between true negatives and positives for the same cost.  相似文献   

13.
A. Darvasi  M. Soller 《Genetics》1994,138(4):1365-1373
Selective genotyping is a method to reduce costs in marker-quantitative trait locus (QTL) linkage determination by genotyping only those individuals with extreme, and hence most informative, quantitative trait values. The DNA pooling strategy (termed: ``selective DNA pooling') takes this one step further by pooling DNA from the selected individuals at each of the two phenotypic extremes, and basing the test for linkage on marker allele frequencies as estimated from the pooled samples only. This can reduce genotyping costs of marker-QTL linkage determination by up to two orders of magnitude. Theoretical analysis of selective DNA pooling shows that for experiments involving backcross, F(2) and half-sib designs, the power of selective DNA pooling for detecting genes with large effect, can be the same as that obtained by individual selective genotyping. Power for detecting genes with small effect, however, was found to decrease strongly with increase in the technical error of estimating allele frequencies in the pooled samples. The effect of technical error, however, can be markedly reduced by replication of technical procedures. It is also shown that a proportion selected of 0.1 at each tail will be appropriate for a wide range of experimental conditions.  相似文献   

14.
Population genomic approaches,which take advantages of high-throughput genotyping,are powerful yet costly methods to scan for selective sweeps.DNA-pooling strategies have been widely used for association studies because it is a cost-effective alternative to large-scale individual genotyping.Here,we performed an SNP-MaP(single nucleotide polymorphism microarrays and pooling)analysis using samples from Eurasia to evaluate the efficiency of pooling strategy in genome-wide scans for selection.By conducting simulations of allelotype data,we first demonstrated that the boxplot with average heterozygosity(HET)is a promising method to detect strong selective sweeps with a moderate level of pooling error.Based on this,we used a sliding window analysis of HET to detect the large contiguous regions(LCRs)putatively under selective sweeps from Eurasia datasets.This survey identified 63 LCRs in a European population.These signals were further supported by the integrated haplotype score(iHS)test using HapMap Ⅱ data.We also confirrned the European-specific signatures of positive selection from several previously identified genes(KEL,TRPV5,TRPV6,EPHB6).In summary,our results not only revealed the high credibility of SNP-MaP strategy in scanning for selective sweeps,but also provided an insight into the population differentiation.  相似文献   

15.
Outbreaks of infectious viruses resulting from spillover events from bats have brought much attention to bat‐borne zoonoses, which has motivated increased ecological and epidemiological studies on bat populations. Field sampling methods often collect pooled samples of bat excreta from plastic sheets placed under‐roosts. However, positive bias is introduced because multiple individuals may contribute to pooled samples, making studies of viral dynamics difficult. Here, we explore the general issue of bias in spatial sample pooling using Hendra virus in Australian bats as a case study. We assessed the accuracy of different under‐roost sampling designs using generalized additive models and field data from individually captured bats and pooled urine samples. We then used theoretical simulation models of bat density and under‐roost sampling to understand the mechanistic drivers of bias. The most commonly used sampling design estimated viral prevalence 3.2 times higher than individual‐level data, with positive bias 5–7 times higher than other designs due to spatial autocorrelation among sampling sheets and clustering of bats in roosts. Simulation results indicate using a stratified random design to collect 30–40 pooled urine samples from 80 to 100 sheets, each with an area of 0.75–1 m2, and would allow estimation of true prevalence with minimum sampling bias and false negatives. These results show that widely used under‐roost sampling techniques are highly sensitive to viral presence, but lack specificity, providing limited information regarding viral dynamics. Improved estimation of true prevalence can be attained with minor changes to existing designs such as reducing sheet size, increasing sheet number, and spreading sheets out within the roost area. Our findings provide insight into how spatial sample pooling is vulnerable to bias for a wide range of systems in disease ecology, where optimal sampling design is influenced by pathogen prevalence, host population density, and patterns of aggregation.  相似文献   

16.
Due to the rising cost of laboratory assays, it has become increasingly common in epidemiological studies to pool biospecimens. This is particularly true in longitudinal studies, where the cost of performing multiple assays over time can be prohibitive. In this article, we consider the problem of estimating the parameters of a Gaussian random effects model when the repeated outcome is subject to pooling. We consider different pooling designs for the efficient maximum likelihood estimation of variance components, with particular attention to estimating the intraclass correlation coefficient. We evaluate the efficiencies of different pooling design strategies using analytic and simulation study results. We examine the robustness of the designs to skewed distributions and consider unbalanced designs. The design methodology is illustrated with a longitudinal study of premenopausal women focusing on assessing the reproducibility of F2-isoprostane, a biomarker of oxidative stress, over the menstrual cycle.  相似文献   

17.
Pooled Genomic Indexing (PGI) is a novel method for physical mapping of clones onto known sequences. PGI is carried out by pooling arrayed clones and generating shotgun sequence reads from the pools. The shotgun sequences are compared to a reference sequence. In the simplest case, clones are placed on an array and are pooled by rows and columns. If a shotgun sequence from a row pool and another shotgun sequence from a column pool match the reference sequence at a close distance, they are both assigned to the clone at the intersection of the two pools. Accordingly, the clone is mapped onto the region of the reference sequence between the two matches. A probabilistic model for PGI is developed, and several pooling designs are described and analyzed, including transversal designs and designs from linear codes. The probabilistic model and the pooling schemes are validated in simulated experiments where 625 rat bacterial artificial chromosome (BAC) clones and 207 mouse BAC clones are mapped onto homologous human sequence.  相似文献   

18.
The paper deals with the problem of estimating the individual weights of objects under a biased spring balance weighing design with equal correlations of errors in the model. A lower bound for the variance of each of the estimated weights resulting from this biased spring balance weighing design is obtained and a necessary and sufficient condition for this lower bound to be attained is given. The incidence matrix of a BIB design has been used to construct optimum biased spring balance weighing designs.  相似文献   

19.
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a “core” species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.  相似文献   

20.
Process analytical technology (PAT) has been gaining a lot of momentum in the biopharmaceutical community due to the potential for continuous real-time quality assurance resulting in improved operational control and compliance. Two of the key goals that have been outlined for PAT are "variability is managed by the process" and "product quality attributes can be accurately and reliably predicted over the design space established for materials used, process parameters, manufacturing, environmental, and other conditions". Recently, we have been examining the feasibility of applying different analytical tools for designing PAT applications for bioprocessing. We have previously shown that a commercially available online high performance liquid chromatography (HPLC) system can be used for analysis that can facilitate real-time decisions for column pooling based on product quality attributes (Rathore et al., 2008). In this article we test the feasibility of using a commercially available ultra- performance liquid chromatography (UPLC) system for real-time pooling of process chromatography columns. It is demonstrated that the UPLC system offers a feasible approach and meets the requirements of a PAT application. While the application presented here is of a reversed phase assay, the approach and the hardware can be easily applied to other modes of liquid chromatography.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号