首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The condensed centromeric regions of higher eukaryotic chromosomes contain satellite sequences, transposons and retroelements, as well as transcribed genes that perform a variety of functions. These chromosomal domains nucleate kinetochores, mediate sister chromatid cohesion and inhibit recombination, yet their characterization has often lagged behind that of chromosome arms. Here, we describe a whole-genome fractionation technique that rapidly identifies bacterial artificial chromosome (BAC) clones derived from plant centromeric regions. This approach, which relies on hybridization of methylated genomic DNA, revealed BACs that correspond to the genetically mapped and sequenced Arabidopsis thaliana centromeric regions. Extending this method to other species in the Brassicaceae family identified centromere-linked clones and provided genome-wide estimates of methylated DNA abundance. Sequencing these clones will elucidate the changes that occur during plant centromere evolution. This genomic fractionation technique could identify centromeric DNA in genomes with similar methylation and repetitive DNA content, including those from crops and mammals.  相似文献   

2.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

3.
4.
Condensin complexes are essential for chromosome condensation and segregation in mitosis, while condensin dysfunction, among other pathways leading to chromosomal bridging in mitosis, may play a role in tumor genomic instability, including recently discovered chromotripsis. To characterize potential double-strand breaks specifically occurring in late anaphase, human chromosomes depleted of condensin were analyzed by γ-H2AX ChIP followed by high-throughput sequencing (ChIP-seq). In condensin-depleted cells, the nonrepeated parts of the genome were shown to contain distinct γ-H2AX enrichment zones 75% of which overlapped with known hemizygous deletions in cancers. Furthermore, some tandemly repeated DNA sequences, analyzed separately from the rest of the genome, showed significant γ-H2AX enrichment in condensin-depleted anaphases. The most commonly occurring targets of such enrichment included simple repeats, centromeric satellites, and rDNA. The two latter categories indicate that acrocentric human chromosomes are especially susceptible to breaks upon condensin deficiency. The genomic regions that are specifically destabilized upon condensin dysfunction may constitute a condensin-specific chromosome destabilization pattern.  相似文献   

5.
High-throughput bisulfite sequencing is widely used to measure cytosine methylation at single-base resolution in eukaryotes. It permits systems-level analysis of genomic methylation patterns associated with gene expression and chromatin structure. However, methods for large-scale identification of methylation patterns from bisulfite sequencing are lacking. We developed a comprehensive tool, CpG_MPs, for identification and analysis of the methylation patterns of genomic regions from bisulfite sequencing data. CpG_MPs first normalizes bisulfite sequencing reads into methylation level of CpGs. Then it identifies unmethylated and methylated regions using the methylation status of neighboring CpGs by hotspot extension algorithm without knowledge of pre-defined regions. Furthermore, the conservatively and differentially methylated regions across paired or multiple samples (cells or tissues) are identified by combining a combinatorial algorithm with Shannon entropy. CpG_MPs identified large amounts of genomic regions with different methylation patterns across five human bisulfite sequencing data during cellular differentiation. Different sequence features and significantly cell-specific methylation patterns were observed. These potentially functional regions form candidate regions for functional analysis of DNA methylation during cellular differentiation. CpG_MPs is the first user-friendly tool for identifying methylation patterns of genomic regions from bisulfite sequencing data, permitting further investigation of the biological functions of genome-scale methylation patterns.  相似文献   

6.
Biswas  Bipasa  Lai  Yinglei 《BMC genomics》2019,20(2):35-47
Background

The next generation sequencing technology allows us to obtain a large amount of short DNA sequence (DNA-seq) reads at a genome-wide level. DNA-seq data have been increasingly collected during the recent years. Count-type data analysis is a widely used approach for DNA-seq data. However, the related data pre-processing is based on the moving window method, in which a window size need to be defined in order to obtain count-type data. Furthermore, useful information can be reduced after data pre-processing for count-type data.

Results

In this study, we propose to analyze DNA-seq data based on the related distance-type measure. Distances are measured in base pairs (bps) between two adjacent alignments of short reads mapped to a reference genome. Our experimental data based simulation study confirms the advantages of distance-type measure approach in both detection power and detection accuracy. Furthermore, we propose artificial censoring for the distance data so that distances larger than a given value are considered potential outliers. Our purpose is to simplify the pre-processing of DNA-seq data. Statistically, we consider a mixture of right censored geometric distributions to model the distance data. Additionally, to reduce the GC-content bias, we extend the mixture model to a mixture of generalized linear models (GLMs). The estimation of model can be achieved by the Newton-Raphson algorithm as well as the Expectation-Maximization (E-M) algorithm. We have conducted simulations to evaluate the performance of our approach. Based on the rank based inverse normal transformation of distance data, we can obtain the related z-values for a follow-up analysis. For an illustration, an application to the DNA-seq data from a pair of normal and tumor cell lines is presented with a change-point analysis of z-values to detect DNA copy number alterations.

Conclusion

Our distance-type measure approach is novel. It does not require either a fixed or a sliding window procedure for generating count-type data. Its advantages have been demonstrated by our simulation studies and its practical usefulness has been illustrated by an experimental data application.

  相似文献   

7.
8.
We have used a combination of BsuE methyltransferase (M-BsuE) and NotI restriction enzyme to cut genomic DNA at a subset of NotI sites. The usefulness of this system is shown in a re-examination of the restriction map of the human MHC. Combinations of methylases and restriction enzymes can be used to generate cuts at different frequencies in genomic DNA, such that they generate ends complementary to NotI ends, and can be used in conjunction with NotI linking clones in chromosome jumping experiments. These enzyme combinations have the potential to produce cutting sites in genomic DNA spaced at intervals favorable for extensive mapping, fragment enrichment, and cloning efforts.  相似文献   

9.
With the advent of high-throughput sequencing, the availability of genomic sequence for comparative genomics is increasing exponentially. Numerous completed plant genome sequences enable characterization of patterns of the retention and evolution of genes within gene families due to multiple polyploidy events, gene loss and fractionation, and differential evolutionary pressures over time and across different gene families. In this report, we trace the changes that have occurred in 12 surviving homoeologous genomic regions from three rounds of polyploidy that contributed to the current Glycine max genome: a genome triplication before the origin of the rosids (~130 to 240 million years ago), a genome duplication early in the legumes (~58 million years ago), and a duplication in the Glycine lineage (~13 million years ago). Patterns of gene retention following the genome triplication event generally support predictions of the Gene Balance Hypothesis. Finally, we find that genes in networks with a high level of connectivity are more strongly conserved than those with low connectivity and that the enrichment of these highly connected genes in the 12 highly conserved homoeologous segments may in part explain their retention over more than 100 million years and repeated polyploidy events.  相似文献   

10.
Many computational methods have been developed to discern intratumor heterogeneity (ITH) using DNA sequence data from bulk tumor samples. These methods share an assumption that two mutations arise from the same subclone if they have similar mutant allele-frequencies (MAFs), and thus it is difficult or impossible to distinguish two subclones with similar MAFs. Single-cell DNA sequencing (scDNA-seq) data can be very informative for ITH inference. However, due to the difficulty of DNA amplification, scDNA-seq data are often very noisy. A promising new study design is to collect both bulk and single-cell DNA-seq data and jointly analyze them to mitigate the limitations of each data type. To address the analytic challenges of this new study design, we propose a computational method named BaSiC (B ulk tumor a nd Si ngle C ell), to discern ITH by jointly analyzing DNA-seq data from bulk tumor and single cells. We demonstrate that BaSiC has comparable or better performance than the methods using either data type. We further evaluate BaSiC using bulk tumor and single-cell DNA-seq data from a breast cancer patient and several leukemia patients.  相似文献   

11.
12.
13.
Chen D  Zhang W  Zhu ZD  Huang Y  Wang P  Zhou BB  Yang XN  Xiao HS  Zhang QH 《遗传》2010,32(12):1296-1303
文章旨在建立一种基因组目标靶序列捕捉文库的方法,并结合第二代测序技术,以实现候选基因区段的深度测序。利用Agilent公司的eArray在线平台,对1250个基因的11824个外显子共2414977bp的基因组序列进行120个碱基长度的捕捉探针(钓饵)设计,并制备成SureSelect液相靶序列捕获试剂。选用2例人基因组DNA,超声打断后末端补平并磷酸化,连接SOLiD接头,回收150bp~200bp的DNA片段,与靶序列探针杂交捕获目标序列,油包水微乳滴PCR扩增后,磁珠分离富集,上SOLiD测序系统通过工作流程分析(WFA)进行文库质量的评价,或正式测序反应。结果显示对所包含的11147个基因外显子片段设计出并合成了46509个捕捉探针,制备成SureSelect试剂盒。探针可有效地捕捉并富集基因组DNA的目标靶片段,定量PCR显示富集效率可达29倍。WFA分析表明文库可以在SOLiD仪器进行正式测序。测序结果显示靶序列区域的测序数占有效总测序数的比例达到70%,覆盖率均在200×以上。结果表明本研究所建立的SureSelect基因组靶序列捕捉、富集建立测序文库的技术路线可行,可直接用于SOLiD测序仪的测序。  相似文献   

14.
Next-generation sequencing (NGS) has caused a revolution in biology. NGS requires the preparation of libraries in which (fragments of) DNA or RNA molecules are fused with adapters followed by PCR amplification and sequencing. It is evident that robust library preparation methods that produce a representative, non-biased source of nucleic acid material from the genome under investigation are of crucial importance. Nevertheless, it has become clear that NGS libraries for all types of applications contain biases that compromise the quality of NGS datasets and can lead to their erroneous interpretation. A detailed knowledge of the nature of these biases will be essential for a careful interpretation of NGS data on the one hand and will help to find ways to improve library quality or to develop bioinformatics tools to compensate for the bias on the other hand. In this review we discuss the literature on bias in the most common NGS library preparation protocols, both for DNA sequencing (DNA-seq) as well as for RNA sequencing (RNA-seq). Strikingly, almost all steps of the various protocols have been reported to introduce bias, especially in the case of RNA-seq, which is technically more challenging than DNA-seq. For each type of bias we discuss methods for improvement with a view to providing some useful advice to the researcher who wishes to convert any kind of raw nucleic acid into an NGS library.  相似文献   

15.
We have developed a method that enriches for methylated cytosines by capturing the fraction of bisulfite-treated DNA with unconverted cytosines. The method, called streptavidin bisulfite ligand methylation enrichment (SuBLiME), involves the specific labeling (using a biotin-labeled nucleotide ligand) of methylated cytosines in bisulfite-converted DNA. This step is then followed by affinity capture, using streptavidin-coupled magnetic beads. SuBLiME is highly adaptable and can be combined with deep sequencing library generation and/or genomic complexity-reduction. In this pilot study, we enriched methylated DNA from Csp6I-cut complexity-reduced genomes of colorectal cancer cell lines (HCT-116, HT-29 and SW-480) and normal blood leukocytes with the aim of discovering colorectal cancer biomarkers. Enriched libraries were sequenced with SOLiD-3 technology. In pairwise comparisons, we scored a total of 1,769 gene loci and 33 miRNA loci as differentially methylated between the cell lines and leukocytes. Of these, 516 loci were differently methylated in at least two promoter-proximal CpG sites over two discrete Csp6I fragments. Identified methylated gene loci were associated with anatomical development, differentiation and cell signaling. The data correlated with good agreement to a number of published colorectal cancer DNA methylation biomarkers and genomic data sets. SuBLiME is effective in the enrichment of methylated nucleic acid and in the detection of known and novel biomarkers.  相似文献   

16.
It has recently been proposed that variation in DNA methylation at specific genomic locations may play an important role in the development of complex diseases such as cancer. Here, we develop 1- and 2-group multiple testing procedures for identifying and quantifying regions of DNA methylation variability. Our method is the first genome-wide statistical significance calculation for increased or differential variability, as opposed to the traditional approach of testing for mean changes. We apply these procedures to genome-wide methylation data obtained from biological and technical replicates and provide the first statistical proof that variably methylated regions exist and are due to interindividual variation. We also show that differentially variable regions in colon tumor and normal tissue show enrichment of genes regulating gene expression, cell morphogenesis, and development, supporting a biological role for DNA methylation variability in cancer.  相似文献   

17.
18.
Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS''s multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.  相似文献   

19.
Extensive sonication of formaldehyde-crosslinked chromatin can generate DNA fragments averaging 200 bp in length (range 75–300 bp). Fragmentation is largely random with respect to genomic region and nucleosome position. ChIP experiments employing such extensively fragmented samples show 2- to 4-fold increased enrichment of protein binding sites over control genomic regions, when compared to samples sonicated to a more conventional size range (300–500 bp). The basis of improved fold enrichments is that immunoprecipitation of protein-bound regions is unaffected by fragment size, whereas immunoprecipitation of control genomic regions decreases progressively along with reduced fragment size due to fewer nonspecific binding sites. The use of extensively sonicated samples improves mapping of protein binding sites, and it extends the dynamic range for quantitative measurements of histone density. We show that many yeast promoter regions are virtually devoid of histones.  相似文献   

20.
The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS) panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV) as well as small insertions and deletions (indel). In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV), similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07–0120 tissue cohort) and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11–1115 tissue cohort) and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号