Draft versus finished sequence data for DNA and protein diagnostic signature development期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Draft versus finished sequence data for DNA and protein diagnostic signature development

Authors:	Gardner Shea N Lam Marisa W Smith Jason R Torres Clinton L Slezak Tom R

Affiliation:	Pathogen Bio-Informatics, Lawrence Livermore National Laboratory, PO Box 808, L-174, Livermore, CA 94551, USA. gardner26@llnl.gov

Abstract:	Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10⁻³–10⁻⁵ (~8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ~1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures.

Keywords:
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏