首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Du  Nan  Chen  Jiao  Sun  Yanni 《BMC genomics》2019,20(2):49-62
Background

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome assembly using long reads is to quickly identify reads forming overlaps. Because PacBio data has higher sequencing error rate and lower coverage than popular short read sequencing technologies (such as Illumina), efficient detection of true overlaps requires specially designed algorithms. In particular, there is still a need to improve the sensitivity of detecting small overlaps or overlaps with high error rates in both reads. Addressing this need will enable better assembly for metagenomic data produced by third-generation sequencing technologies.

Results

In this work, we designed and implemented an overlap detection program named GroupK, for third-generation sequencing reads based on grouped k-mer hits. While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection. Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection. The experimental results of applying our pipeline to both simulated and real third-generation sequencing data showed that GroupK enables more sensitive overlap detection, especially for datasets of low sequencing coverage.

Conclusions

GroupK is best used for detecting small overlaps for third-generation sequencing data. It provides a useful supplementary tool to existing ones for more sensitive and accurate overlap detection. The source code is freely available at https://github.com/Strideradu/GroupK.

  相似文献   

2.
Zhu  Fangfang  Li  Jiang  Liu  Juan  Min  Wenwen 《BMC genetics》2021,22(1):1-10
Background

Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency.

Results

Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%.

Conclusions

For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

  相似文献   

3.
[背景] 近年来,马铃薯疮痂病的危害态势逐渐上升,在全国各主要产区均有发生,目前由于缺乏有效的防治手段,已造成了严重的经济损失。生物防治是防治土传病害的有效途径,逐渐成为了研究热点。[目的] 筛选对马铃薯疮痂病菌具有较高拮抗效果的菌株,为生防菌剂的开发提供菌种资源,同时也为马铃薯疮痂病的防治奠定理论基础。[方法] 采用平板对峙法和牛津杯试验法对分离得到的菌株进行初筛和复筛,并通过形态特征、生理生化特征及16S rRNA基因、gyrB基因序列分析结果对菌株进行鉴定。通过平板对峙法测定菌株的抑菌谱。[结果] 获得一株具有明显拮抗效果的菌株BKS104,抑菌直径达到43 mm,防效达到85%。其菌落圆形、乳白色、不透明、表面有光泽,边缘整齐,菌体杆状,革兰氏阳性菌。结合16S rRNA基因、gyrB基因的测序结果将其鉴定为贝莱斯芽孢杆菌,并对8种植物病原真菌均具有抑制效果。[结论] 菌株BKS104为贝莱斯芽孢杆菌,可以有效抑制马铃薯疮痂病菌的生长,安全性高,具有良好的生防潜力。  相似文献   

4.
【目的】从北海涠洲岛海域腐烂的马尾藻中分离得到的海洋弧菌(Vibrio X511)具有较强的利用褐藻胶能力,本文利用转录组测序的方法以研究弧菌X511的褐藻胶代谢途径。【方法】采用Illumina Hi Seq2500测序平台对菌株在褐藻胶及葡萄糖培养下的转录组进行测序;比较和分析差异转录本,利用荧光定量PCR验证测序结果;采用GO(Gene Ontology)和KEGG(Kyoto Encyclopedia of Genes and Genomes)对差异转录本进行功能和Pathway注释。【结果】经比较发现,菌株在褐藻胶培养下相对于葡萄糖的培养共有2024个差异表达基因,其中1066个基因上调,958个基因下调;某些普遍存在于代谢途径中的基因在不同培养条件下也存在差异表达;海洋弧菌X511中涉及褐藻胶利用的所有基因以及合成乙醇的关键基因其转录量均有一定程度的上调;此外,通过分析发现该菌株具有独特的褐藻胶利用方式,其中的一个代谢过程尚未在弧菌中被报道。【结论】成功解析了海洋弧菌X511的褐藻胶代谢途径,丰富了生物方法降解褐藻胶的研究,为大型海藻生物质能源的研究提供有价值的数据支持。  相似文献   

5.
[背景]耐药基因可通过水平转移在环境、动物和人体间发生转移,而远距离传播则主要通过候鸟的迁徙.耐药基因可通过水平转移和候鸟迁徙跨地区传播至禽畜和人类,引起公共卫生问题.[目的]分离广州南沙湿地公园候鸟粪便中肠杆菌科细菌,并鉴定菌种类别,研究其对常见抗生素的耐药性及携带的主要超广谱β-内酰胺酶(extended-spec...  相似文献   

6.
广西蚕沙细菌组成多样性解析和VBNC菌群的发掘   总被引:1,自引:0,他引:1  
【目的】解析广西蚕沙中细菌群落组成与多样性,为蚕沙中菌种资源发掘和蚕沙的综合利用提供科学依据。【方法】通过高通量测序技术研究细菌群落组成特征,同时利用常规稀释涂布平板法和基于复苏促进因子(Rpf)的MPN培养系统解析并筛选蚕沙中可培养和活的非可培养(VBNC)状态的优势菌群,并经16SrRNA基因测序对筛选得到的菌株作初步分类鉴定。【结果】高通量测序表明,广西蚕沙样品中细菌归属于10个门、18个纲、27个目、57个科、96个属,其中4个属的丰度达1%以上,优势菌群为变形菌门(Proteobacteria)肠杆菌属(Enterobacter);通过稀释涂布平板法共获得14个属的33株可培养细菌,其中4个属(Citrobacter、Weissella、Chitinophaga、Pseudoclavibacter)在高通量测序中未被检测到;而在MPN培养系统中,基于复苏促进因子的处理组细菌总数最大检出丰度提高了100倍,并从中共检出21株对Rpf敏感的VBNC菌株,其中6个属(Paenibacillus、Caulobacter、Roseomonas、Pantoea、Erwinia、Acine...  相似文献   

7.
Flexibility and low cost make genotyping‐by‐sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI‐MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference‐free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000–11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking.  相似文献   

8.
A simplified model is presented of the dynamics of excitatory and inhibitory neurons in the cerebral cortex. A key feature of the model is that neurons may cease to fire when strongly depolarized (spike inactivation). Computer simulations for different parameters reveal five classes of solutons: a) steady states in which neither excitatory nor inhibitory cells are active, b) steady states in which one or both types of cells fire repetitively, c) states in which one type of cell fluctuates rapidly between bursts of action potentials and inactivity due to strong depolarization, d) rhythmic activity in which both types of cells fire in unison followed by a period of spike inactivation and e) states similar to d but in which the inhibitory cells never produce action potentials. Solutions b, c, d, and e qualitatively resemble the different firing patterns observed during experimental seizures. It is shown that changes in those parameters that are functions of potassium concentration can induce changes in the type of solution. It is therefore proposed that the increase in extracellular potassium concentration during seizures may be responsible for the progressive changes observed in firing patterns and particularly for the transition from tonic to clonic patterns. A method is also outlined for testing the predictions of the model.  相似文献   

9.
Xiong  Ying  Chen  Shuai  Tang  Buzhou  Chen  Qingcai  Wang  Xiaolong  Yan  Jun  Zhou  Yi 《BMC bioinformatics》2021,22(1):1-18
Background

For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature.

Results

We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models.

Conclusion

RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB.

  相似文献   

10.
11.
《Plant Ecology & Diversity》2013,6(2-3):189-200
Background: Many researchers have simply recorded first flowering dates, while others have recorded the full extent of flowering. Such flowering curves show the rate of increase and decrease in flowering, as well as the day on which flowering is a maximum.

Aim: To develop objective statistical methods for the estimation and comparison of flowering curves, with particular emphasis on the date of maximal flowering.

Methods: We considered data collected either as percentages or as actual counts of numbers of flowers. We developed appropriate techniques for fitting regression curves involving non-linear least squares and Poisson regression, including a new generalisation of the epsilon-skew-normal curve.

Results: Our generalised regression curve was found to be sufficiently flexible to provide good estimates of flowering in a wide variety of situations. The five parameters of this curve have a direct and straightforward interpretation, namely the date and magnitude of maximum flowering, along with the spread, skewness and kurtosis of flowering. The method of maximum likelihood was used to provide estimates and confidence limits for the parameters and to compare Crocosmia flowering curves over eight consecutive years.

Conclusions: Regression curves, particularly those of the generalised skew-normal, give an effective, practical and objective procedure for estimating and comparing flower curves.  相似文献   

12.
程雯  蒲桂洪  牛国清  邹祥 《微生物学报》2021,61(12):3977-3990
[目的] 分析荒漠拟孢囊菌CCTCC M2020063中A82846B合成的代谢途径和关键基因。[方法] 使用Illumina二代测序和PacBio三代测序技术对荒漠拟孢囊菌CCTCC M2020063进行全基因组测序,利用Glimmer预测编码序列,使用HPLC和LCMS鉴定次级代谢产物,使用antiSMASH 5.0软件预测次级代谢产物合成基因簇。利用Geneious软件对A82846B合成相关基因进行分析,对其中的mbtH类基因着重分析。[结果] 本实验菌株鉴定为荒漠拟孢囊菌(Kibdelosporangium aridum),基因组中有一条线性染色体,无质粒,序列全长12475688 bp,GC含量为66.27%,有11900个开放阅读框,共有47个基因簇。该菌株具有合成A82846B的能力,且生物合成相关基因位于Cluster32,包含33个基因,mbtH类基因gene07864的过表达促进A82846的合成,提升了26.42%,卤化酶基因为gene07859,与万古霉素、巴利霉素的卤化酶相似度较高。[结论] 本研究对荒漠拟孢囊菌CCTCC M2020063进行了基因组序列分析,获得了A82846B生物合成相关的功能基因信息,为A82846B的代谢途径和工程改造提供了强有力的基础。  相似文献   

13.
【背景】华重楼(Paris polyphylla var. chinensis)是我国一种名贵稀缺中药材,有多种药效,由于过度采挖等原因,其野生资源现已极度匮乏。华重楼的人工栽培技术尚未成熟,生长缓慢、病害频繁发生是主要的制约因素。【目的】植物益生菌的开发是一种环保且有效的解决途径,符合生态种植的要求。【方法】通过常规方法分离鉴定内生菌,选取已报道具有促生抗病作用的菌株进行玉米种子发芽试验,无氮培养基定性检测固氮活性,平板拮抗试验检测抗菌性;高效液相色谱-质谱联用技术(Liquid Chromatography-Mass Spectrometry,LC-MS)检测其菌液的代谢成分;分光光度法检测华重楼幼苗丙二醛和叶绿素含量,LC-MS检测叶片水杨酸、茉莉酸、脱落酸、赤霉素、细胞分裂素、生长素的含量,测量华重楼地下部分生物量指标。【结果】分离得到一株菌jdqmzz-1,经rDNA ITS序列扩增比对、形态学观察、生理生化鉴定,确定为聚多曲霉(Aspergillus sydowii);其可以促进玉米种子发芽和华重楼生长,具有固氮、拮抗病原真菌尖孢镰刀菌(Fusarium oxysporum GSICC 60612)、病原细菌胡萝卜软腐果胶杆菌胡萝卜亚种(Pectobacterium carotovorum subsp. carotovorum,Pcc ATCC15713)的作用,真菌拮抗指数为42.50%,其菌液含有种类较多的促生和抑菌杀虫物质,菌液处理过的华重楼幼苗,叶片内源赤霉素和生长素含量较对照分别提高了54.1倍和2.3倍,叶绿素含量达到209.88 mg/g鲜重,较对照增加了48.80%,丙二醛含量较对照降低了15.20%;平均根数、平均根长、平均百株重较对照也有显著性(P0.01)提高。【结论】分离出的内生真菌聚多曲霉jdqmzz-1能够有效地促进华重楼的生长。  相似文献   

14.
【背景】鼠传疾病是对人类危害较大的一种人兽共患病,全球化使得鼠传疾病流行区域不断扩大,出现了多种新发鼠传疾病的发生及旧传染病的复燃。【目的】调查新疆阿勒泰地区常见的鼠传致病菌在啮齿动物中的流行状况,为当地自然疫源性疾病防控提供科学依据。【方法】采用夹夜法捕获啮齿动物,无菌收集其脾脏和肾脏组织,提取基因组DNA。应用TaqMan探针法的荧光定量聚合酶链式反应(quantitative polymerase chain reaction, qPCR)检测巴尔通体(Bartonella spp.)、问号钩端螺旋体(Leptospirainterrogans)、恙虫病东方体(Orientiatsutsugamushi)、莫氏立克次体(Rickettsia mooseri)、嗜吞噬细胞无形体(Anaplasma phagocytophilum)和土拉弗朗西斯菌(Francisella tularensis)6种常见的鼠传致病菌。采用16SrRNA基因的通用引物进行常规PCR扩增后,应用Illumina测序和Nanopore测序进一步检测致病菌,同时对脾脏组织进行巴尔通体体外分离培养。比较qPCR...  相似文献   

15.
16.
【背景】植物内生菌是微生物群落中一类非常重要的组成部分,是重要的微生物资源库,在植物促生、抗逆等多个领域有重要的研究和应用价值。【目的】进一步了解干旱荒漠区盐生植物内生真菌的多样性、群落结构和潜在功能特征。【方法】对生长在乌兹别克斯坦西咸海岸边的两种盐生植物毛足假木贼(Anabasis eriopoda abbreviated as AE)和展枝假木贼(A. truncata abbreviated as AT)的内生真菌群落进行扩增子测序分析。【结果】共获得166个ampliconsequencingvariants(ASVs),涉及4门49属,其中Neocamarosporium、Botryosphaeria和Alternaria及其所属高级分类单元是优势类群。多样性和群落组成分析显示两种盐生植物的内生真菌存在较为明显的差异,并包含一些潜在的新分类单元。基于PICRUSt2和FUNGuild的功能预测结果表明这两种盐生植物内生真菌的潜在功能和营养方式多样且表现出宿主差异性。【结论】盐生植物内生真菌具有较高的多样性和潜在的资源价值,有待进一步挖掘和研究。  相似文献   

17.
【背景】石油被称为“液体黄金”,人类的工业生产活动在利用其创造巨大社会价值的同时,也对自然环境造成了严重的污染。微生物修复技术是现阶段治理石油类污染有效的手段之一,具有经济、高效、无二次污染等优点。【目的】从受石油污染的土壤中分离高效降解长链烷烃正二十四烷的菌株,探究其降解特性及在微生物修复中的应用前景。【方法】通过形态学及16S rRNA基因测序进行菌株鉴定,采用气相色谱法检测菌株对正二十四烷的降解效果,并结合气相色谱-质谱(gas chromatography-mass spectrometer, GC-MS)分析降解中间产物以推测其潜在代谢途径。【结果】筛选到一株可高效降解正二十四烷的菌株C24MT1,经鉴定为不动杆菌属(Acinetobacter)。该菌株最适降解条件为30 °C、pH 9.0、盐度2 g/L,该条件下生长7 d对9 g/L正二十四烷的降解率高达86.63%;与此同时,菌株在强碱性环境(pH 11.0)中生长良好(OD600为0.39)并保持较高烷烃降解率(75.38%),对极端环境具备较强的耐受能力;对降解中间产物进行分析,推断菌株代谢长链烷烃正二十四烷的途径可能包括末端氧化及次末端氧化。【结论】不动杆菌C24MT1具有良好的环境适应能力及烷烃降解能力,在后续微生物菌剂开发和石油类污染土壤的环境修复领域具有巨大的应用前景。本研究可为盐碱地区高浓度石油类污染土壤的修复提供优良菌种,并进一步丰富石油烃类生物降解的菌种资源库。  相似文献   

18.
Background

Tsetse flies (Diptera: Glossinidae) are solely responsible for the transmission of African trypanosomes, causative agents of sleeping sickness in humans and nagana in livestock. Due to the lack of efficient vaccines and the emergence of drug resistance, vector control approaches such as the sterile insect technique (SIT), remain the most effective way to control disease. SIT is a species-specific approach and therefore requires accurate identification of natural pest populations at the species level. However, the presence of morphologically similar species (species complexes and sub-species) in tsetse flies challenges the successful implementation of SIT-based population control.

Results

In this study, we evaluate different molecular tools that can be applied for the delimitation of different Glossina species using tsetse samples derived from laboratory colonies, natural populations and museum specimens. The use of mitochondrial markers, nuclear markers (including internal transcribed spacer 1 (ITS1) and different microsatellites), and bacterial symbiotic markers (Wolbachia infection status) in combination with relatively inexpensive techniques such as PCR, agarose gel electrophoresis, and to some extent sequencing provided a rapid, cost effective, and accurate identification of several tsetse species.

Conclusions

The effectiveness of SIT benefits from the fine resolution of species limits in nature. The present study supports the quick identification of large samples using simple and cost effective universalized protocols, which can be easily applied by countries/laboratories with limited resources and expertise.

  相似文献   

19.
甲状腺结节是最常见的疾病之一,其精确诊断对于患者的有效临床管理十分重要。分子标志物是一项非常有效的诊断和预后评估工具,尤其在细胞学不确定的甲状腺癌结节。近年来,分子标志物的临床应用发展已经取得显著的进步。随着新一代基因检测技术的发展,能够同时检测多个基因,这不仅可为甲状腺癌的诊断提供依据,而且也可为预测甲状腺癌患者的预后提供参考,本文就甲状腺癌的诊断及预后相关的分子标志物进行综述。  相似文献   

20.
【目的】通过转录组高通量测序技术(即RNA-seq),结合生物信息学分析和分子生物学方法,在组学水平鉴定极端嗜盐菌中可能的非编码RNA(nc RNA)。【方法】将培养至对数中期的地中海富盐菌在不同盐浓度下处理30分钟,提取RNA,进行链特异的转录组测序和5′端区分的转录组测序,通过生物信息学分析在全基因组范围内鉴定nc RNA,预测其转录边界;然后通过Northern blot和环化RNA反转录聚合酶链式反应(CR-RT-PCR)对部分预测的nc RNA进行实验验证。【结果】比较两种RNA-seq技术在不同培养条件下的RNA测序结果和对转录单元的精细分析,共鉴定到105个高可信度的nc RNA,并发现4个在不同盐度下表达差异较大的nc RNA,通过Northern blot和CR-RT-PCR验证了inc RNA1436和inc RNA1903的表达情况、转录本、转录起始位点及终止位点等。【结论】首次在组学水平鉴定了地中海富盐菌中的nc RNA,不同盐浓度刺激下部分nc RNA的转录差异暗示其有可能参与地中海富盐菌对盐胁迫的适应,高可信度nc RNA的组学发现为今后全面开展嗜盐古菌nc RNA的功能机制研究提供了基础数据及重要的切入点。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号