首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Skaug HJ 《Biometrics》2001,57(3):750-756
Genetic data are becoming increasingly important in ecology and conservation biology. This article presents a novel method for estimating population size from DNA profiles obtained from a random sample of individuals. The underlying idea is that the degree of biological relationship between individuals in the sample reflects the size of the population and that DNA profiles provide information about relatedness. A pseudolikelihood approach is taken, involving pairwise comparison of individuals. The main field of applications is seen to be catch data, and as an example, the method is applied to DNA profiles (10 microsatellite loci) from 334 North Atlantic minke whales. It is concluded that the sample size is too small for the method to give useful results. The question about the required sample size is investigated by simulation.  相似文献   

2.
High‐throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co‐occurring taxa. It is a flexible model‐based method adapted to uneven sample sizes and to large and sparse data sets. Here, we compare LDA performance on abundance and occurrence data, and we quantify the robustness of the LDA decomposition by measuring its stability with respect to the algorithm's initialization. We then apply LDA to a survey of 1,131 soil DNA samples that were collected in a 12‐ha plot of primary tropical forest and amplified using standard primers for bacteria, protists, fungi and metazoans. The analysis reveals that bacteria, protists and fungi exhibit a strong spatial structure, which matches the topographical features of the plot, while metazoans do not, confirming that microbial diversity is primarily controlled by environmental variation at the studied scale. We conclude that LDA is a sensitive, robust and computationally efficient method to detect and interpret the structure of large DNA‐based biodiversity data sets. We finally discuss the possible future applications of this approach for the study of biodiversity.  相似文献   

3.
Optical mapping by direct visualization of individual DNA molecules, stretched in nanochannels with sequence-specific fluorescent labeling, represents a promising tool for disease diagnostics and genomics. An important challenge for this technique is thermal motion of the DNA as it undergoes imaging; this blurs fluorescent patterns along the DNA and results in information loss. Correcting for this effect (a process referred to as kymograph alignment) is a common preprocessing step in nanochannel-based optical mapping workflows, and we present here a highly efficient algorithm to accomplish this via pattern recognition. We compare our method with the one previous approach, and we find that our method is orders of magnitude faster while producing data of similar quality. We demonstrate proof of principle of our approach on experimental data consisting of melt mapped bacteriophage DNA.  相似文献   

4.
The type and frequency of simple sequence repeats (SSRs) in plant genomes was investigated using the expanding quantity of DNA sequence data deposited in public databases. In Arabidopsis, 306 genomic DNA sequences longer than 10 kb and 36,199 EST sequences were searched for all possible mono- to pentanucleotide repeats. The average frequency of SSRs was one every 6.04 kb in genomic DNA, decreasing to one every 14 kb in ESTs. SSR frequency and type differed between coding, intronic, and intergenic DNA. Similar frequencies were found in other plant species. On the basis of these findings, an approach is proposed and demonstrated for the targeted isolation of single or multiple, physically clustered SSRs linked to any gene that has been mapped using low-copy DNA-based markers. The approach involves sample sequencing a small number of subclones of selected randomly sheared large insert DNA clones (e.g., BACs). It is shown to be both feasible and practicable, given the probability of fortuitously sequencing through an SSR. The approach is demonstrated in barley where sample sequencing 34 subclones of a single BAC selected by hybridization to the Big1 gene revealed three SSRs. These allowed Big1 to be located at the top of barley linkage group 6HS.  相似文献   

5.
Digital PCR (dPCR) is a highly accurate molecular approach, capable of precise measurements, offering a number of unique opportunities. However, in its current format dPCR can be limited by the amount of sample that can be analysed and consequently additional considerations such as performing multiplex reactions or pre-amplification can be considered. This study investigated the impact of duplexing and pre-amplification on dPCR analysis by using three different assays targeting a model template (a portion of the Arabidopsis thaliana alcohol dehydrogenase gene). We also investigated the impact of different template types (linearised plasmid clone and more complex genomic DNA) on measurement precision using dPCR. We were able to demonstrate that duplex dPCR can provide a more precise measurement than uniplex dPCR, while applying pre-amplification or varying template type can significantly decrease the precision of dPCR. Furthermore, we also demonstrate that the pre-amplification step can introduce measurement bias that is not consistent between experiments for a sample or assay and so could not be compensated for during the analysis of this data set. We also describe a model for estimating the prevalence of molecular dropout and identify this as a source of dPCR imprecision. Our data have demonstrated that the precision afforded by dPCR at low sample concentration can exceed that of the same template post pre-amplification thereby negating the need for this additional step. Our findings also highlight the technical differences between different templates types containing the same sequence that must be considered if plasmid DNA is to be used to assess or control for more complex templates like genomic DNA.  相似文献   

6.
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.  相似文献   

7.
We develop a Markov chain Monte Carlo approach for estimating the distribution of the age of a mutation that is assumed to have arisen just once in the history of the population of interest. We assume that in addition to the presence or absence of this mutation in a sample of chromosomes, we have DNA sequence data from a region completely linked to the mutant site. We apply our method to a mitochondrial data set in which the DNA sequence data come from hypervariable region I and the mutation of interest is the 9-bp region V deletion.  相似文献   

8.
Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual''s germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual''s inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail.  相似文献   

9.
This study introduces a DNA microarray-based genotyping system for accessing single nucleotide polymorphisms (SNPs) directly from a genomic DNA sample. The described one-step approach combines multiplex amplification and allele-specific solid-phase PCR into an on-chip reaction platform. The multiplex amplification of genomic DNA and the genotyping reaction are both performed directly on the microarray in a single reaction. Oligonucleotides that interrogate single nucleotide positions within multiple genomic regions of interest are covalently tethered to a glass chip, allowing quick analysis of reaction products by fluorescence scanning. Due to a fourfold SNP detection approach employing simultaneous probing of sense and antisense strand information, genotypes can be automatically assigned and validated using a simple computer algorithm. We used the described procedure for parallel genotyping of 10 different polymorphisms in a single reaction and successfully analyzed more than 100 human DNA samples. More than 99% of genotype data were in agreement with data obtained in control experiments with allele-specific oligonucleotide hybridization and capillary sequencing. Our results suggest that this approach might constitute a powerful tool for the analysis of genetic variation.  相似文献   

10.
Massively parallel sequencing of DNA molecules in the plasma of pregnant women has been shown to allow accurate and noninvasive prenatal detection of fetal trisomy 21. However, whether the sequencing approach is as accurate for the noninvasive prenatal diagnosis of trisomy 13 and 18 is unclear due to the lack of data from a large sample set. We studied 392 pregnancies, among which 25 involved a trisomy 13 fetus and 37 involved a trisomy 18 fetus, by massively parallel sequencing. By using our previously reported standard z-score approach, we demonstrated that this approach could identify 36.0% and 73.0% of trisomy 13 and 18 at specificities of 92.4% and 97.2%, respectively. We aimed to improve the detection of trisomy 13 and 18 by using a non-repeat-masked reference human genome instead of a repeat-masked one to increase the number of aligned sequence reads for each sample. We then applied a bioinformatics approach to correct GC content bias in the sequencing data. With these measures, we detected all (25 out of 25) trisomy 13 fetuses at a specificity of 98.9% (261 out of 264 non-trisomy 13 cases), and 91.9% (34 out of 37) of the trisomy 18 fetuses at 98.0% specificity (247 out of 252 non-trisomy 18 cases). These data indicate that with appropriate bioinformatics analysis, noninvasive prenatal diagnosis of trisomy 13 and trisomy 18 by maternal plasma DNA sequencing is achievable.  相似文献   

11.
The detection of DNA polymorphisms by RFLP analysis is having a major impact on identity testing in forensic science. At present, this approach is the best effort a forensic scientist can make to exclude an individual who has been falsely associated with an evidentiary sample found at a crime scene. When an analysis fails to exclude a suspect as a potential contributor of an evidentiary sample, a means should be provided to assess suitable weight to the putative match. Most important, the statistical analysis should not place undue weight on a genetic profile derived from an unknown sample that is attributed to an accused individual. The method must allow for limitations in conventional agarose-submarine-gel electrophoresis and Southern blotting procedure, limited sample population data, possible subpopulation differences, and potential sampling error. A conservative statistical method was developed based on arbitrarily defined fixed bins. This approach permits classification of continuous allelic data, provides for a simple and portable data-base system, and is unlikely to underestimate the frequency of occurrence of a set of alleles. This will help ensure that undue weight is not placed on a sample attributed to an accused individual.  相似文献   

12.
Chromatin immunoprecipitation (ChIP) is an analytical method used to investigate the interactions between proteins and DNA in vivo. ChIP is often used as a quantitative tool, and proper quantification relies on the use of adequate references for data normalization. However, many ChIP experiments involve analyses of samples that have been submitted to experimental treatments with unknown effects, and this precludes the choice of suitable internal references. We have developed a normalization method based on the use of a synthetic DNA-antibody complex that can be used as an external reference instead. A fixed amount of this synthetic DNA-antibody complex is spiked into the chromatin extract at the beginning of the ChIP experiment. The DNA-antibody complex is isolated together with the sample of interest, and the amounts of synthetic DNA recovered in each tube are measured at the end of the process. The yield of synthetic DNA recovery in each sample is then used to normalize the results obtained with the antibodies of interest. Using this approach, we could compensate for losses of material, reduce the variability between ChIP replicates, and increase the accuracy and statistical resolution of the data.  相似文献   

13.
For some groups of organisms, DNA barcoding can provide a useful tool in taxonomy, evolutionary biology, and biodiversity assessment. However, the efficacy of DNA barcoding depends on the degree of sampling per species, because a large enough sample size is needed to provide a reliable estimate of genetic polymorphism and for delimiting species. We used a simulation approach to examine the effects of sample size on four estimators of genetic polymorphism related to DNA barcoding: mismatch distribution, nucleotide diversity, the number of haplotypes, and maximum pairwise distance. Our results showed that mismatch distributions derived from subsamples of ≥20 individuals usually bore a close resemblance to that of the full dataset. Estimates of nucleotide diversity from subsamples of ≥20 individuals tended to be bell‐shaped around that of the full dataset, whereas estimates from smaller subsamples were not. As expected, greater sampling generally led to an increase in the number of haplotypes. We also found that subsamples of ≥20 individuals allowed a good estimate of the maximum pairwise distance of the full dataset, while smaller ones were associated with a high probability of underestimation. Overall, our study confirms the expectation that larger samples are beneficial for the efficacy of DNA barcoding and suggests that a minimum sample size of 20 individuals is needed in practice for each population.  相似文献   

14.
15.
Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).  相似文献   

16.
Ion-pair reverse-phase high-performance liquid chromatography is presented as a versatile platform for the rapid analysis of nucleic acid modification reactions in a high-throughput manner. This system allows both sensitive and nonradioactive assays to be developed for a variety of nucleic acid modification reactions. Examples presented here include assays for telomerase, uracil DNA glycosylase, polynucleotide kinase, T4 DNA ligase, C5-DNA methyltransferases, and the mismatch endonuclease CEL I. However, this approach is not confined to these reactions. Indeed the ability to perform a variety of nonradioactive assays with throughput times of 10 min per sample in conjunction with automated data analysis software represents a significant improvement in analytical and preparative nucleic acid enzymology.  相似文献   

17.
The use of model‐based methods to infer a phylogenetic tree from a given data set is frequently motivated by the truism that under certain circumstances the parsimony approach (MP) may produce incorrect topologies, while explicit model‐based approaches are believed to avoid this problem. In the realm of empirical data from actual taxa, it is not known (or knowable) how commonly MP, maximum‐likelihood or Bayesian inference are inaccurate. To test the perceived need for “sophisticated” model‐based approaches, we assessed the degree of congruence between empirical phylogenetic hypotheses generated by alternative methods applied to DNA sequence data in a sample of 1000 recently published articles. Of 504 articles that employed multiple methods, only two exhibited strongly supported incongruence among alternative methods. This result suggests that the MP approach does not produce deviant hypotheses of relationship due to convergent evolution in long branches. Our finding therefore indicates that the use of multiple analytical methods is largely superfluous. We encourage the use of analytical approaches unencumbered by ad hoc assumptions that sap the explanatory power of the evidence. © The Willi Hennig Society 2010.  相似文献   

18.
Zou N  Ditty S  Li B  Lo SC 《BioTechniques》2003,35(4):758-60, 762-5
Here we report a new methodology to study trace amounts of DNA of unknown sequence using a two-step PCR strategy to amplify and clone target DNA. The first PCR is carried out with a partial random primer comprised of a specific 21-nucleotide 5' sequence, a random heptamer, and a 3' TGGC clamp. The second PCR is carried out with a single 19-nucleotide primer that matches the specific 5' sequence of the partial random primer. Using human and Mycoplasma genitalium DNA as examples, we demonstrated the efficiency of this approach by effectively cloning target DNA fragments from 1 pg DNA sample. The cloning sensitivity could reach 100 fg target DNA templates. Compared to the strategy of first adding adapter sequences to facilitate the PCR amplification of unknown sequences, this approach has the advantage of allowing for the amplification of DNA samples in both natural and denatured forms, which provides greater flexibility in sample preparation. This is an efficient strategy to retrieve sequences from trace DNA samples from various sources.  相似文献   

19.
20.
We previously developed a cladistic approach to identify subsets of haplotypes defined by restriction endonuclease mapping or DNA sequencing that are associated with significant phenotypic deviations. Our approach was limited to segments of DNA in which little recombination occurs. In such cases, a cladogram can be constructed from the restriction site or sequence data that represents the evolutionary steps that interrelate the observed haplotypes. The cladogram is used to define a nested statistical design to identify mutational steps associated with significant phenotypic deviations. The central assumption behind this strategy is that any undetected mutation causing a phenotypic effect is embedded within the same evolutionary history that is represented by the cladogram. The power of this approach depends upon the confidence one has in the particular cladogram used to draw inferences. In this paper, we present a strategy for estimating the set of cladograms that are consistent with a particular sample of either restriction site or nucleotide sequence data and that includes the possibility of recombination. We first evaluate the limits of parsimony in constructing cladograms. Once these limits have been determined, we construct the set of parsimonious and nonparsimonious cladograms that is consistent with these limits. Our estimation procedure also identifies haplotypes that are candidates for being products of recombination. If recombination is extensive, our algorithm subdivides the DNA region into two or more subsections, each having little or no internal recombination. We apply this estimation procedure to three data sets to illustrate varying degrees of cladogram ambiguity and recombination.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号