共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Will we catch fish today? Our grandfathers’ responses were usually something along the lines of, ‘Probably. I've caught them here before’. One of the foundations of ecology is identifying which species are present, and where. This informs our understanding of species richness patterns, spread of invasive species, and loss of threatened and endangered species due to environmental change. However, our understanding is often lacking, particularly in aquatic environments where biodiversity remains hidden below the water's surface. The emerging field of metagenetic species surveillance is aiding our ability to rapidly determine which aquatic species are present, and where. In this issue of Molecular Ecology Resources, Ficetola et al. ( 2015 ) provide a framework for metagenetic environmental DNA surveillance to foster the confidence of our grandfathers’ fishing prowess by more rigorously evaluating the replication levels necessary to quantify detection errors and ultimately improving our confidence in aquatic species presence. 相似文献
3.
Melanie Bahlo Rick Tankard Vesna Lukic Karen L. Oliver Katherine R. Smith 《Human genetics》2014,133(11):1331-1341
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, ‘sporadic’ cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data. 相似文献
4.
Detecting genomic structural variants from high-throughput sequencing data is a complex and unresolved challenge. We have developed a statistical learning approach, based on Random Forests, that integrates prior knowledge about the characteristics of structural variants and leads to improved discovery in high-throughput sequencing data. The implementation of this technique, forestSV, offers high sensitivity and specificity coupled with the flexibility of a data-driven approach. 相似文献
5.
When humans detect and discriminate visual motion, some neural mechanism extracts the motion information that is embedded in the noisy spatio-temporal stimulus. We show that an ideal mechanism in a motion discrimination experiment cross-correlates the received waveform with the signals to be discriminated. If the human visual system uses such a cross-correlator mechanism, discrimination performance should depend on the cross-correlation between the two signals. Manipulations of the signals' cross-correlation using differences in the speed and phase of moving gratings produced the predicted changes in the performance of human observers. The cross-correlator's motion performance improves linearly as contrast increases and human performance is similar. The ideal cross-correlator can be implemented by passing the stimulus through linear spatio-temporal filters matched to the signals. We propose that directionally selective simple cells in the striate cortex serve as matched filters during motion detection and discrimination. 相似文献
6.
Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license. 相似文献
7.
8.
Sailakshmi Subramanian Valentina Di Pierro Hardik Shah Anitha D. Jayaprakash Ian Weisberger Jaehee Shim Ajish George Bruce D. Gelb Ravi Sachidanandam 《Nucleic acids research》2013,41(16):e154
MiST is a novel approach to variant calling from deep sequencing data, using the inverted mapping approach developed for Geoseq. Reads that can map to a targeted exonic region are identified using exact matches to tiles from the region. The reads are then aligned to the targets to discover variants. MiST carefully handles paralogous reads that map ambiguously to the genome and clonal reads arising from PCR bias, which are the two major sources of errors in variant calling. The reduced computational complexity of mapping selected reads to targeted regions of the genome improves speed, specificity and sensitivity of variant detection. Compared with variant calls from the GATK platform, MiST showed better concordance with SNPs from dbSNP and genotypes determined by an exonic-SNP array. Variant calls made only by MiST confirm at a high rate (>90%) by Sanger sequencing. Thus, MiST is a valuable alternative tool to analyse variants in deep sequencing data. 相似文献
9.
10.
It is widely believed that both common and rare variants contribute to the risks of common diseases or complex traits and the cumulative effects of multiple rare variants can explain a significant proportion of trait variances. Advances in high-throughput DNA sequencing technologies allow us to genotype rare causal variants and investigate the effects of such rare variants on complex traits. We developed an adaptive ridge regression method to analyze the collective effects of multiple variants in the same gene or the same functional unit. Our model focuses on continuous trait and incorporates covariate factors to remove potential confounding effects. The proposed method estimates and tests multiple rare variants collectively but does not depend on the assumption of same direction of each rare variant effect. Compared with the Bayesian hierarchical generalized linear model approach, the state-of-the-art method of rare variant detection, the proposed new method is easy to implement, yet it has higher statistical power. Application of the new method is demonstrated using the well-known data from the Dallas Heart Study. 相似文献
11.
Robert Schall 《Biometrical journal. Biometrische Zeitschrift》2012,54(4):537-551
Many confidence intervals calculated in practice are potentially not exact, either because the requirements for the interval estimator to be exact are known to be violated, or because the (exact) distribution of the data is unknown. If a confidence interval is approximate, the crucial question is how well its true coverage probability approximates its intended coverage probability. In this paper we propose to use the bootstrap to calculate an empirical estimate for the (true) coverage probability of a confidence interval. In the first instance, the empirical coverage can be used to assess whether a given type of confidence interval is adequate for the data at hand. More generally, when planning the statistical analysis of future trials based on existing data pools, the empirical coverage can be used to study the coverage properties of confidence intervals as a function of type of data, sample size, and analysis scale, and thus inform the statistical analysis plan for the future trial. In this sense, the paper proposes an alternative to the problematic pretest of the data for normality, followed by selection of the analysis method based on the results of the pretest. We apply the methodology to a data pool of bioequivalence studies, and in the selection of covariance patterns for repeated measures data. 相似文献
12.
A critical step in detecting variants from next-generation sequencing data is post hoc filtering of putative variants called or predicted by computational tools. Here, we highlight four critical parameters that could enhance the accuracy of called single nucleotide variants and insertions/deletions: quality and deepness, refinement and improvement of initial mapping, allele/strand balance, and examination of spurious genes. Use of these sequence features appropriately in variant filtering could greatly improve validation rates, thereby saving time and costs in next-generation sequencing projects. 相似文献
13.
14.
15.
Bacterial chromosomes are highly polarized in their nucleotide composition through mutational selection related to replication. Using compositional skews such as the GC skew, replication origin and terminus can be predicted in silico by observing the shift points. However, the genome sequence is affected by myriad functional requirements and selection on numerous subgenomic features, and elimination of this "noise" should lead to better predictions. Here, we present a noise-reduction approach that uses low-pass filtering through Fast Fourier transform coupled with cumulative skew graphs. It increases the prediction accuracy of the replication termini compared with previously documented methods based on genomic base composition. 相似文献
16.
Denisov G Walenz B Halpern AL Miller J Axelrod N Levy S Sutton G 《Bioinformatics (Oxford, England)》2008,24(8):1035-1040
Motivation: We present an algorithm to identify allelic variationgiven a Whole Genome Shotgun (WGS) assembly of haploid sequences,and to produce a set of haploid consensus sequences rather thana single consensus sequence. Existing WGS assemblers take acolumn-by-column approach to consensus generation, and producea single consensus sequence which can be inconsistent with theunderlying haploid alleles, and inconsistent with any of thealigned sequence reads. Our new algorithm uses a dynamic windowingapproach. It detects alleles by simultaneously processing theportions of aligned reads spanning a region of sequence variation,assigns reads to their respective alleles, phases adjacent variantalleles and generates a consensus sequence corresponding toeach confirmed allele. This algorithm was used to produce thefirst diploid genome sequence of an individual human. It canalso be applied to assemblies of multiple diploid individualsand hybrid assemblies of multiple haploid organisms. Results: Being applied to the individual human genome assembly,the new algorithm detects exactly two confirmed alleles andreports two consensus sequences in 98.98% of the total number2 033 311 detected regions of sequence variation. In 33 269out of 460 373 detected regions of size >1 bp, it fixes theconstructed errors of a mosaic haploid representation of a diploidlocus as produced by the original Celera Assembler consensusalgorithm. Using an optimized procedure calibrated against 1506 344 known SNPs, it detects 438 814 new heterozygous SNPswith false positive rate 12%. Availability: The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/ Contact: gdenisov{at}jcvi.org
Associate Editor: John Quackenbush 相似文献
17.
Li J Su Z Ma ZQ Slebos RJ Halvey P Tabb DL Liebler DC Pao W Zhang B 《Molecular & cellular proteomics : MCP》2011,10(5):M110.006536
Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. 相似文献
18.
CJ Davidson E Zeringer KJ Champion MP Gauthier F Wang J Boonyaratanakornkit JR Jones E Schreiber 《BioTechniques》2012,53(3):182-188
Fluorescent dye terminator Sanger sequencing (FTSS), with detection by automated capillary electrophoresis (CE), has long been regarded as the gold standard for variant detection. However, software analysis and base-calling algorithms used to detect mutations were largely optimized for resequencing applications in which different alleles were expected as heterozygous mixtures of 50%. Increasingly, the requirements for variant detection are an analytic sensitivity for minor alleles of <20%, in particular, when assessing the mutational status of heterogeneous tumor samples. Here, we describe a simple modification to the FTSS workflow that improves the limit of detection of cell-line gDNA mixtures from 50%-20% to 5% for G>A transitions and from 50%-5% to 5% for G>C and G>T transversions. In addition, we use two different sample types to compare the limit of detection of sequence variants in codons 12 and 13 of the KRAS gene between Sanger sequencing and other methodologies including shifted termination assay (STA) detection, single-base extension (SBE), pyrosequencing (PS), high- resolution melt (HRM), and real-time PCR (qPCR). 相似文献
19.
20.
Assessing an unknown evolutionary process: effect of increasing site-specific knowledge through taxon addition 总被引:2,自引:0,他引:2
Assessment of the evolutionary process is crucial for understanding the effect of protein structure and function on sequence evolution and for many other analyses in molecular evolution. Here, we used simulations to study how taxon sampling affects accuracy of parameter estimation and topological inference in the absence of branch length asymmetry. With maximum-likelihood analysis, we find that adding taxa dramatically improves both support for the evolutionary model and accurate assessment of its parameters when compared with increasing the sequence length. Using a method we call "doppelg?nger trees," we distinguish the contributions of two sources of improved topological inference: greater knowledge about internal nodes and greater knowledge of site-specific rate parameters. Surprisingly, highly significant support for the correct general model does not lead directly to improved topological inference. Instead, substantial improvement occurs only with accurate assessment of the evolutionary process at individual sites. Although these results are based on a simplified model of the evolutionary process, they indicate that in general, assuming processes are not independent and identically distributed among sites, more extensive sampling of taxonomic biodiversity will greatly improve analytical results in many current sequence data sets with moderate sequence lengths. 相似文献