首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost , a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria × ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramér's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.  相似文献   

施季森  王占军  陈金慧 《遗传》2012,34(2):145-156
近年来, 植物全基因组测序的结果正如雨后春笋般涌现, 木本植物全基因组测序也在紧锣密鼓地展开。但由于木本植物通常基因组较大, 基因组结构较为复杂, 在测序、测序后的组装、注释、功能分析等均存在较大的困难。在基因组测序分析的经费预算方面也存在着较大的压力。因此, 有必要对这方面的研究进展及其存在问题进行分析比较, 以提高林木全基因组研究方面的效率。文章在比较分析已经发展起来的3代基因测序技术(Sanger测序法、合成测序法和单分子测序法)的基础上, 选择4种已经公布的木本植物(杨树、葡萄、番木瓜、苹果), 从全基因组测序的研究背景、测序结果及应用的研究进展和存在问题等方面进行了述评, 对未来要开展的木本植物全基因组测序前的准备工作(材料选择、遗传图谱和连锁图谱的构建、测序技术的选择), 全基因组测序结果的生物信息学分析和应用进行了讨论。  相似文献   

木本植物全基因组测序研究进展   总被引:4,自引:0,他引:4  
Shi JS  Wang ZJ  Chen JH 《遗传》2012,34(2):145-156
近年来,植物全基因组测序的结果正如雨后春笋般涌现,木本植物全基因组测序也在紧锣密鼓地展开。但由于木本植物通常基因组较大,基因组结构较为复杂,在测序、测序后的组装、注释、功能分析等均存在较大的困难。在基因组测序分析的经费预算方面也存在着较大的压力。因此,有必要对这方面的研究进展及其存在问题进行分析比较,以提高林木全基因组研究方面的效率。文章在比较分析已经发展起来的3代基因测序技术(Sanger测序法、合成测序法和单分子测序法)的基础上,选择4种已经公布的木本植物(杨树、葡萄、番木瓜、苹果),从全基因组测序的研究背景、测序结果及应用的研究进展和存在问题等方面进行了述评,对未来要开展的木本植物全基因组测序前的准备工作(材料选择、遗传图谱和连锁图谱的构建、测序技术的选择),全基因组测序结果的生物信息学分析和应用进行了讨论。  相似文献   



Advances in human genomics have allowed unprecedented productivity in terms of algorithms, software, and literature available for translating raw next-generation sequence data into high-quality information. The challenges of variant identification in organisms with lower quality reference genomes are less well documented. We explored the consequences of commonly recommended preparatory steps and the effects of single and multi sample variant identification methods using four publicly available software applications (Platypus, HaplotypeCaller, Samtools and UnifiedGenotyper) on whole genome sequence data of 65 key ancestors of Swiss dairy cattle populations. Accuracy of calling next-generation sequence variants was assessed by comparison to the same loci from medium and high-density single nucleotide variant (SNV) arrays.


The total number of SNVs identified varied by software and method, with single (multi) sample results ranging from 17.7 to 22.0 (16.9 to 22.0) million variants. Computing time varied considerably between software. Preparatory realignment of insertions and deletions and subsequent base quality score recalibration had only minor effects on the number and quality of SNVs identified by different software, but increased computing time considerably. Average concordance for single (multi) sample results with high-density chip data was 58.3% (87.0%) and average genotype concordance in correctly identified SNVs was 99.2% (99.2%) across software. The average quality of SNVs identified, measured as the ratio of transitions to transversions, was higher using single sample methods than multi sample methods. A consensus approach using results of different software generally provided the highest variant quality in terms of transition/transversion ratio.


Our findings serve as a reference for variant identification pipeline development in non-human organisms and help assess the implication of preparatory steps in next-generation sequencing pipelines for organisms with incomplete reference genomes (pipeline code is included). Benchmarking this information should prove particularly useful in processing next-generation sequencing data for use in genome-wide association studies and genomic selection.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-948) contains supplementary material, which is available to authorized users.  相似文献   

High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.  相似文献   

Accounting for historical demographic features, such as the strength and timing of gene flow and divergence times between closely related lineages, is vital for many inferences in evolutionary biology. Approximate Bayesian computation (ABC) is one method commonly used to estimate demographic parameters. However, the DNA sequences used as input for this method, often microsatellites or RADseq loci, usually represent a small fraction of the genome. Whole genome sequencing (WGS) data, on the other hand, have been used less often with ABC, and questions remain about the potential benefit of, and how to best implement, this type of data; we used pseudo‐observed data sets to explore such questions. Specifically, we addressed the potential improvements in parameter estimation accuracy that could be associated with WGS data in multiple contexts; namely, we quantified the effects of (a) more data, (b) haplotype‐based summary statistics, and (c) locus length. Compared with a hypothetical RADseq data set with 2.5 Mbp of data, using a 1 Gbp data set consisting of 100 Kbp sequences led to substantial gains in the accuracy of parameter estimates, which was mostly due to haplotype statistics and increased data. We also quantified the effects of including (a) locus‐specific recombination rates, and (b) background selection information in ABC analyses. Importantly, assuming uniform recombination or ignoring background selection had a negative effect on accuracy in many cases. Software and results from this method validation study should be useful for future demographic history analyses.  相似文献   

Xiao P  Li RH 《遗传》2011,33(6):654-660
二代测序技术及全基因组多样性比较是现代生物学及信息科学研究的热点,对基因组中转座元件(Transposable element)的分析已成为基因组比较分析的重要组成部分。目前对于转座元件的种类、数量和组成的挖掘和分析一般是基于完全拼接后的全基因组序列,对在此之前的海量短片段序列后期处理及拼接仍是目前基因组研究的盲点,以转座元件为主的重复序列在拼接过程中也存在着不可避免的拼接误差或丢失,给转座元件系统的分析带来不确定。文章旨在建立一套分析流程,对铜绿微囊藻NIES 843全基因组构建的罗氏(Roche)公司454测序随机模拟原始数据集的转座元件(主要类型为插入序列:Insert sequence,IS)组成进行分析,结果表明,采用对核酸探针扫描后备选序列分成3组,并分设氨基酸检测阈值的方案分析得到的结果较为可靠,结果显示铜绿微囊藻NIES843的蓝藻转座元件占基因组比例的10.38%,归属于14个IS家族,66个IS亚家族。与之前基于完整拼接基因组数据的两套不同分析流程得到的结果相比,在丰度及家族/亚家族组成上无显著差异,在转座元件序列水平上也显示了高比例的相似性序列重叠,证实了本研究流程在基于高通量测序原始数据的转座元件分析方面具可靠性及实用性。  相似文献   

Inference of population structure from genetic data plays an important role in population and medical genetics studies. With the advancement and decreasing cost of sequencing technology, the increasingly available whole genome sequencing data provide much richer information about the underlying population structure. The traditional method originally developed for array-based genotype data for computing and selecting top principal components (PCs) that capture population structure may not perform well on sequencing data for two reasons. First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n / p $n/p$ is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative PCs based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.  相似文献   

The most frequently used method to identify mutations induced by a commonly used mutagen, EMS (ethyl methane sulfonate), in Arabidopsis thaliana has been map-based cloning. The first step of this method is crossing a mutant with a plant of another accession as it requires polymorphisms between accessions for linkage analysis. Therefore, to perform the method routinely, it is greatly preferred to use accession combinations between which enough polymorphisms are already known. Further, it requires laborious examination of a large number of F? recombinants using many markers to detect each polymorphism. After linkage analysis narrows down the chromosomal region containing the causal mutation, sequencing candidate genes one by one within the region is necessary until the mutation is finally identified. Overall, this method is generally time-consuming and labor intensive, and it becomes harder when multiple loci are involved in phenotypes. A few recent reports showed that causal mutations induced by EMS could be identified by deep-sequencing technologies with less labor compared with the conventional method when mutants were generated in the Arabidopsis reference Columbia background whose genome organization is well known. Here we report that we succeeded in rapid identification of EMS-induced causal mutations in a non-reference accession background, whose whole genome sequence is not publicly available, using one round of whole genome sequencing. Moreover, in our case, we could monitor the causal locus and the transgenic reporter locus simultaneously, implying that this methodology could theoretically be applicable to analyzing even complex traits. We describe the pipeline of this methodology and discuss its characteristics.  相似文献   

李鑫  李凯  李一佳  马磊 《生物信息学》2016,14(3):188-194
SeqMule可根据调用的人类基因组和外显子组数据自动调节变量,对所有测序数据的单核苷酸多态性(Single nucleotide polymorphism,SNP)进行分析和注释。目的:通过对两名痛风患者的实验数据进行分析,详细地为生物信息学研究人员介绍了SeqMule软件,以期为全基因组和外显子组测序数据提供一站式的分析途径。方法:基于SeqMule内置的BWA(BurrowsWheeler Aligner)、GATK(The Genome Analysis Toolkit)、SAMtools、Freebayes比对和分析工具,以两名痛风患者的DNA测序数据分析为例,本文详细地论述了SeqMule的特点及操作,并对两名患者的外显子测序数据进行了自动化比对与SNP分析。发现SeqMule优化了很多分析软件存在的一些问题,可以对外显子组和全基因组测序数据实现全面、灵活、高效地自动化分析,能更好地分析高通量测序数据,最终提升数据分析的一致性和准确性。  相似文献   

Recent advances in whole genome sequencing (WGS) have allowed identification of genes for disease susceptibility in humans. The objective of our research was to exploit whole genome sequences of 13 rice (Oryza sativa L.) inbred lines to identify non-synonymous SNPs (nsSNPs) and candidate genes for resistance to sheath blight, a disease of worldwide significance. WGS by the Illumina GA IIx platform produced an average 5× coverage with ~700 K variants detected per line when compared to the Nipponbare reference genome. Two filtering strategies were developed to identify nsSNPs between two groups of known resistant and susceptible lines. A total of 333 nsSNPs detected in the resistant lines were absent in the susceptible group. Selected variants associated with resistance were found in 11 of 12 chromosomes. More than 200 genes with selected nsSNPs were assigned to 42 categories based on gene family/gene ontology. Several candidate genes belonged to families reported in previous studies, and three new regions with novel candidates were also identified. A subset of 24 nsSNPs detected in 23 genes was selected for further study. Individual alleles of the 24 nsSNPs were evaluated by PCR whose presence or absence corresponded to known resistant or susceptible phenotypes of nine additional lines. Sanger sequencing confirmed presence of 12 selected nsSNPs in two lines. “Resistant” nsSNP alleles were detected in two accessions of O. nivara that suggests sources for resistance occur in additional Oryza sp. Results from this study provide a foundation for future basic research and marker-assisted breeding of rice for sheath blight resistance.  相似文献   

Bloodstream infections are a major cause of morbidity and mortality worldwide. Early administration of appropriate antimicrobial therapy can improve patient survival and prevent antimicrobial resistance (AMR). Whole genome sequencing (WGS) can provide information for pathogen identification, AMR prediction and sequence typing earlier than current phenotypic diagnostic methods.WGS was performed on 97 clinical blood specimens and matched culture isolate pairs. Specimen/isolate pairs were MLST sequence-typed and further characterization was performed on Streptococcus species.WGS correctly identified 91.7% of clinical specimens and 93.2% of matched isolates representing 35 different microbial species. MLST types were assigned for 89.9% of matched cultures and 21.7% of blood specimens, with higher success for blood culture specimens extracted within 3 days (52% characterized) than 7 days (9.3%).This study demonstrates the potential use of WGS for identification and characterization of pathogens directly from blood culture specimens to facilitate timely initiation of appropriate antimicrobial therapies.  相似文献   

The cultivated strawberry is one of the youngest domesticated plants, developed in France in the 1700s from chance hybridization between two western hemisphere octoploid species. However, little is known about the evolution of the species that gave rise to this important fruit crop. Phylogenetic analysis of chloroplast genome sequences of 21 Fragaria species and subspecies resolves the western North American diploid F. vesca subsp. bracteata as sister to the clade of octoploid/decaploid species. No extant tetraploids or hexaploids are directly involved in the maternal ancestry of the octoploids.There is strong geographic segregation of chloroplast haplotypes in subsp. bracteata, and the gynodioecious Pacific Coast populations are implicated as both the maternal lineage and the source of male-sterility in the octoploid strawberries. Analysis of sexual system evolution in Fragaria provides evidence that the loss of male and female function can follow polyploidization, but does not seem to be associated with loss of self-incompatibility following genome doubling. Character-state mapping provided insight into sexual system evolution and its association with loss of self-incompatibility and genome doubling/merger. Fragaria attained its circumboreal and amphitropical distribution within the past one to four million years and the rise of the octoploid clade is dated at 0.372–2.05 million years ago.  相似文献   

正Genetic mitochondrial disorders are a heterogenous group of multi-system disorders caused by an imbalance in mitochondrial function (Moggio et al.,2014;Wallace,2018).In contrast to the nuclear genome,each cell contains hundreds,or even thousands,of mtDNA molecules (Veltri et al.,1990;Calvo et al., 2006).Thus,a mixture of different mtDNA sequences can co-exist within the same individual,a situation referred to as he terop las my.The level of heteroplasmy in an individual often affects the penetrance and phenotypic severity of the diseases.Consequently,detection of sequence heteroplasmy is essential for the proper clinical interpretation of mitochondrial diseases (Stewart and Chinnery,2015).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号