首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Recent studies used the contact data or three-dimensional (3D) genome reconstructions from Hi-C (chromosome conformation capture with next-generation sequencing) to assess the co-localization of functional genomic annotations in the nucleus. These analyses dichotomized data point pairs belonging to a functional annotation as “close” or “far” based on some threshold and then tested for enrichment of “close” pairs. We propose an alternative approach that avoids dichotomization of the data and instead directly estimates the significance of distances within the 3D reconstruction.

Results

We applied this approach to 3D genome reconstructions for Plasmodium falciparum, the causative agent of malaria, and Saccharomyces cerevisiae and compared the results to previous approaches. We found significant 3D co-localization of centromeres, telomeres, virulence genes, and several sets of genes with developmentally regulated expression in P. falciparum; and significant 3D co-localization of centromeres and long terminal repeats in S. cerevisiae. Additionally, we tested the experimental observation that telomeres form three to seven clusters in P. falciparum and S. cerevisiae. Applying affinity propagation clustering to telomere coordinates in the 3D reconstructions yielded six telomere clusters for both organisms.

Conclusions

Distance-based assessment replicated key findings, while avoiding dichotomization of the data (which previously yielded threshold-sensitive results).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-992) contains supplementary material, which is available to authorized users.  相似文献   

2.
During meiosis, DNA double-strand breaks (DSBs) are formed at high frequency at special chromosomal sites, called DSB hotspots, to generate crossovers that aid proper chromosome segregation. Multiple chromosomal features affect hotspot formation. In the fission yeast S. pombe the linear element proteins Rec25, Rec27 and Mug20 are hotspot determinants – they bind hotspots with high specificity and are necessary for nearly all DSBs at hotspots. To assess whether they are also sufficient for hotspot determination, we localized each linear element protein to a novel chromosomal site (ade6 with lacO substitutions) by fusion to the Escherichia coli LacI repressor. The Mug20-LacI plus lacO combination, but not the two separate lac elements, produced a strong ade6 DSB hotspot, comparable to strong endogenous DSB hotspots. This hotspot had unexpectedly low ade6 recombinant frequency and negligible DSB hotspot competition, although like endogenous hotspots it manifested DSB interference. We infer that linear element proteins must be properly placed by endogenous functions to impose hotspot competition and proper partner choice for DSB repair. Our results support and expand our previously proposed DSB hotspot-clustering model for local control of meiotic recombination.  相似文献   

3.
Meiotic recombination is required for the orderly segregation of chromosomes during meiosis and for providing genetic diversity among offspring. Among mammals, as well as yeast and higher plants, recombination preferentially occurs at highly delimited chromosomal sites 1–2 kb long known as hotspots. Although considerable progress has been made in understanding the roles various proteins play in carrying out the molecular events of the recombination process, relatively little is understood about the factors controlling the location and relative activity of mammalian recombination hotspots. To search for trans-acting factors controlling the positioning of recombination events, we compared the locations of crossovers arising in an 8-Mb segment of a 100-Mb region of mouse Chromosome 1 (Chr 1) when the longer region was heterozygous C57BL/6J (B6) × CAST/EiJ (CAST) and the remainder of the genome was either similarly heterozygous or entirely homozygous B6. The lack of CAST alleles in the remainder of the genome resulted in profound changes in hotspot activity in both females and males. Recombination activity was lost at several hotspots; new, previously undetected hotspots appeared; and still other hotspots remained unaffected, indicating the presence of distant trans-acting gene(s) whose CAST allele(s) activate or suppress the activity of specific hotspots. Testing the activity of three activated hotspots in sperm samples from individual male progeny of two genetic crosses, we identified a single trans-acting regulator of hotspot activity, designated Rcr1, that is located in a 5.30-Mb interval (11.74–17.04 Mb) on Chr 17. Using an Escherichia coli cloning assay to characterize the molecular products of recombination at two of these hotspots, we found that Rcr1 controls the appearance of both crossover and noncrossover gene conversion events, indicating that it likely controls the sites of the double-strand DNA breaks that initiate the recombination process.  相似文献   

4.
5.
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.  相似文献   

6.
Homologous recombination occurs especially frequently near special chromosomal sites called hotspots. In Escherichia coli, Chi hotspots control RecBCD enzyme, a protein machine essential for the major pathway of DNA break-repair and recombination. RecBCD generates recombinogenic single-stranded DNA ends by unwinding DNA and cutting it a few nucleotides to the 3′ side of 5′ GCTGGTGG 3′, the sequence historically equated with Chi. To test if sequence context affects Chi activity, we deep-sequenced the products of a DNA library containing 10 random base-pairs on each side of the Chi sequence and cut by purified RecBCD. We found strongly enhanced cutting at Chi with certain preferred sequences, such as A or G at nucleotides 4–7, on the 3′ flank of the Chi octamer. These sequences also strongly increased Chi hotspot activity in E. coli cells. Our combined enzymatic and genetic results redefine the Chi hotspot sequence, implicate the nuclease domain in Chi recognition, indicate that nicking of one strand at Chi is RecBCD''s biologically important reaction in living cells, and enable more precise analysis of Chi''s role in recombination and genome evolution.  相似文献   

7.
8.
Little is known about the factors determining the location and activity of the rapidly evolving meiotic crossover hotspots that shape genome diversity. Here, we show that several histone modifications are enriched at the active mouse Psmb9 hotspot, and we distinguish those marks that precede from those that follow hotspot recombinational activity. H3K4Me3, H3K4Me2 and H3K9Ac are specifically enriched in the chromatids that carry an active initiation site, and in the absence of DNA double-strand breaks (DSBs) in Spo11−/− mice. We thus propose that these marks are part of the substrate for recombination initiation at the Psmb9 hotspot. In contrast, hyperacetylation of H4 is increased as a consequence of DSB formation, as shown by its dependency on Spo11 and by the enrichment detected on both recombining chromatids. In addition, the comparison with another hotspot, Hlx1, strongly suggests that H3K4Me3 and H4 hyperacetylation are common features of DSB formation and repair, respectively. Altogether, the chromatin signatures of the Psmb9 and Hlx1 hotspots provide a basis for understanding the distribution of meiotic recombination.  相似文献   

9.
We aimed to compare the effect of three estradiol benzoate (EB) doses on follicular wave emergence (FWE) and dominant follicle growth of suckled Nelore cows submitted to TAI (D0). On a random day of estrous cycle (D−10), multiparous (MULT; n=36) and primiparous (PRIM; n=20) suckled Nelore cows received an intravaginal progesterone (P4) device and were assigned in three groups. Cows in the EB-1 (n=20), EB-1.5 (n=15) or EB-2 (n=21) groups received, respectively, an im treatment with 1, 1.5 or 2 mg EB. A subgroup (n=10-13 cows/group) were subject to daily ovarian evaluations from D−10 to D0. On D−2, P4 devices were removed, and all cows received the same treatment: 1 mg estradiol cypionate, 0.53 mg sodium cloprostenol, and 300 IU eCG. Statistical analyses were performed considering only the main effects of treatment group and parity order. The proportion of cows with a synchronized FWE and the moment of the FWE did not differ (p>0.05) among the treatment groups (overall: 80% [28/35] and 4.1 ± 0.4 days); however, the FWE occurred earlier (p=0.007) in MULT (3.8 ± 0.2 days) than PRIM (5.1 ± 0.4) cows. The proportion of animals detected in estrus was greater (86% [31/36] vs. 70% [14/20]; p=0.02) and the dominant follicle was larger on D−2 (9.7 ± 0.3 mm vs. 7.8 ± 0.7 mm; p=0.006) and D0 (11.9 ± 0.4 mm vs. 10 ± 0.5 mm; p=0.008) in MULT than PRIM cows. In conclusion, the three EB doses presented similar efficiency to synchronize the FWE in suckled Nelore cows. Moreover, a delayed FWE and smaller dominant follicle is observed in PRIM cows, contributing to the reduced reproductive performance in this parity category when using similar TAI protocols of MULT cows.  相似文献   

10.
The vast majority of meiotic recombination events (crossovers (COs) and non-crossovers (NCOs)) cluster in narrow hotspots surrounded by large regions devoid of recombinational activity. Here, using a new molecular approach in plants, called “pollen-typing”, we detected and characterized hundreds of CO and NCO molecules in two different hotspot regions in Arabidopsis thaliana. This analysis revealed that COs are concentrated in regions of a few kilobases where their rates reach up to 50 times the genome average. The hotspots themselves tend to cluster in regions less than 8 kilobases in size with overlapping CO distribution. Non-crossover (NCO) events also occurred in the two hotspots but at very different levels (local CO/NCO ratios of 1/1 and 30/1) and their track lengths were quite small (a few hundred base pairs). We also showed that the ZMM protein MSH4 plays a role in CO formation and somewhat unexpectedly we also found that it is involved in the generation of NCOs but with a different level of effect. Finally, factors acting in cis and in trans appear to shape the rate and distribution of COs at meiotic recombination hotspots.  相似文献   

11.
Sequencing DNA fragments associated with proteins following in vivo cross-linking with formaldehyde (known as ChIP-seq) has been used extensively to describe the distribution of proteins across genomes. It is not widely appreciated that this method merely estimates a protein''s distribution and cannot reveal changes in occupancy between samples. To do this, we tagged with the same epitope orthologous proteins in Saccharomyces cerevisiae and Candida glabrata, whose sequences have diverged to a degree that most DNA fragments longer than 50 bp are unique to just one species. By mixing defined numbers of C. glabrata cells (the calibration genome) with S. cerevisiae samples (the experimental genomes) prior to chromatin fragmentation and immunoprecipitation, it is possible to derive a quantitative measure of occupancy (the occupancy ratio – OR) that enables a comparison of occupancies not only within but also between genomes. We demonstrate for the first time that this ‘internal standard’ calibration method satisfies the sine qua non for quantifying ChIP-seq profiles, namely linearity over a wide range. Crucially, by employing functional tagged proteins, our calibration process describes a method that distinguishes genuine association within ChIP-seq profiles from background noise. Our method is applicable to any protein, not merely highly conserved ones, and obviates the need for the time consuming, expensive, and technically demanding quantification of ChIP using qPCR, which can only be performed on individual loci. As we demonstrate for the first time in this paper, calibrated ChIP-seq represents a major step towards documenting the quantitative distributions of proteins along chromosomes in different cell states, which we term biological chromodynamics.  相似文献   

12.
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data.Post-translational modifications (PTMs)1 are a rapidly expanding and important class of protein feature that broaden the functional diversity of proteins in a proteome. By definition, PTMs change protein structure and therefore have the potential to affect protein function by altering protein interactions, protein stability or catalytic activity (1, 2). As they have been found to occur on nearly every protein in the eukaryotic proteome, PTMs broadly impact nearly all known cellular processes. Over 300 different types of PTM are known, ranging from single atom modifications (e.g. oxide) to small protein modifiers (e.g. ubiquitin), which can occur on all but five amino acid residues resulting from enzymatic or nonenzymatic processes (3). Over 220,000 distinct PTM sites have been experimentally identified across ∼77,000 different proteins to date (dbPTM; http://dbptm.mbc.nctu.edu.tw/statistics.php) – numbers that continue to grow exponentially because of improved methods for high throughput detection by mass spectrometry (MS). By virtue of how they are detected, most PTM data are sequence-linked and lack structural context.The function of most PTMs is unknown because the rate of PTM detection far surpasses the rate at which any one modification can be studied empirically. Moreover, the functional impact of every PTM is likely not equivalent (4). For example, computational analysis of phosphorylation sites in yeast and human proteomes indicate that well-conserved phosphosites are more likely to have a functional consequence compared with poorly conserved sites, yet only a fraction of phosphosites are well conserved (5, 6). Consequently, the development of tools that provide functional prioritization of PTMs could have a broad impact on our understanding of protein regulation, biological mechanism, and molecular evolution.The emerging need for methods that predict the functional impact of a PTM has not yet been met. Longstanding methods capitalize predominantly on the sequence context of PTMs and have been used to predict sites of modification (expasy.org/proteomics/post-translational_modification) and to compare enzyme/substrate interactions (79). More recently, studies aimed at expanding the parameters associated with functional PTMs have emerged. In these cases, a set of common features correlated with functional importance are derived from the analysis of PTMs within and between organisms including: number of PTM observations at a multiple sequence alignment position (i.e. hotspots), measures of co-occurrence between different PTMs (e.g. distance between phosphorylation and ubiquitination sites), biological dynamics (up or down-regulation), and protein–protein interaction influence (7, 1012). Recent efforts to provide structural context by linking individual PTMs to three-dimensional structures in the protein data bank (PDB) have also been described (13, 14). However, these resources are extensions of existing PTM databases that allow visualization of single instances of modification onto individual proteins, but do not provide quantitative or analytical value.In principle, combining PTM hotspot and structural analysis would offer multiple advantages over any one approach used in isolation. Sequence homology provides protein family membership—thereby clustering PTMs into hotspots for groups of proteins to provide information about: (1) the evolutionary conservation and (2) observation frequencies of PTMs within the family. A primary consequence of their sequence homology is that members of a protein family will exhibit similar structures and protein interactions—features that dictate the function of protein systems. A secondary consequence is that PTM hotspots generated by alignment can be projected onto family-representative protein structures, which places each PTM hotspot into a three-dimensional context that can be visualized for each family. The structural context enabled by this projection can also provide spatial information about the PTM site that can supplement the sequence characteristics of the hotspot, namely: (3) solvent accessibility, which provides an estimate of whether a modification could occur on the folded protein; and (4) protein interface residence, which indicates the potential of the PTM to disrupt protein–protein interactions. Despite the theoretical advantages, no single tool has been developed that exploits the quantitative output from both sequence and structural data to evaluate the function potential of PTMs.Here we describe a new analytical method – Structural Analysis of PTM Hotspots (SAPH-ire), which ranks PTM hotspots by their potential to impact biological function for distinct protein families (Fig. 1). We demonstrate the application of SAPH-ire to the complete set of PTMs for eight distinct protein families including large heterotrimeric G proteins—revealing high-ranking hotspots for which a biological function has not yet been determined. In particular, SAPH-ire revealed the N-terminal tail (Nt) of G protein gamma (Gγ) subunits as one of the highest ranking PTM hotspots for heterotrimeric G proteins (Gα, Gβ, and Gγ). We tested this prediction by monitoring the phosphorylation state and mutation effects of phosphorylation sites in the N terminus of the yeast Gγ subunit (Ste18). Consistent with SAPH-ire predictions, we found that phosphorylation of Ste18-Nt is biologically responsive to a GPCR stimulus and that phospho-null or phospho-mimic mutation of these sites controls protein abundance in an opposite manner in vivo. Thus, SAPH-ire is a powerful new method for predicting the function potential of PTM hotspots, which can guide empirical research toward the discovery of new protein regulatory elements based on high-throughput proteomics.Open in a separate windowFig. 1.Schematic diagram of the SAPH-ire method. A, SAPH-ire integrates InterPro, the Protein Data bank (PDB) and a customized database of experimentally validated PTMs. Uniprot entries with PTMs that belong to specific InterPro-classified protein families undergo multiple-sequence alignment (MSA) and PTM hotspot analysis (HSA), which layers all PTMs for a given alignment position in the MSA. The total PTMs observed in each hotspot and the conservation of a modifiable residue (e.g. conservation lysine at a ubiquitination hotspot) at the hotspot are quantified. B, PTM hotspots within the protein family are then projected onto all known crystal structures for the family using the Structural Projection of PTMs (SPoP) tool. From the structural topology of PTM hotspots generated by SPoP, the solvent accessible surface area (SASA) and protein interface residence is quantified for each hotspot. C, PTM Function Potential Calculator (FPC) integrates the output from HSA and SPoP, resulting in PTM function potential scores for each hotspot. The function potential score can be used to rank PTM hotspots within or between protein families – prioritizing hotspots with the greatest potential to be biologically regulated and/or effect a biological function for the protein family of interest.  相似文献   

13.
The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data.  相似文献   

14.
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray [1]. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68–81% of known hotspots, and among total hotspot predictions, 58–67% were actual hotspots. Hence, these models have precision P = 58–67% and recall R = 68–81%. The corresponding models for Feature Set 2 had P = 55–59% and R = 81–92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73–81% and P = 64–71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.  相似文献   

15.
Recombination hotspots are the regions within the genome where the rate, and the frequency of recombination are optimum with a size varying from 1 to 2 kb. The recombination event is mediated by the double-stranded break formation, guided by the combined enzymatic action of DNA topoisomerase and Spo 11 endonuclease. These regions are distributed non-uniformly throughout the human genome and cause distortions in the genetic map. Numerous lines of evidence suggest that the number of hotspots known in humans has increased manifold in recent years. A few facts about the hotspot evolutions were also put forward, indicating the differences in the hotspot position between chimpanzees and humans. In mice, recombination hot spots were found to be clustered within the major histocompatibility complex (MHC) region. Several models, that help explain meiotic recombination has been proposed. Moreover, scientists also developed some computational tools to locate the hotspot position and estimate their recombination rate in humans is of great interest to population and medical geneticists. Here we reviewed the molecular mechanisms, models and in silico prediction techniques of hot spot residues.  相似文献   

16.
17.
18.
19.
In most eukaryotes, the prophase of the first meiotic division is characterized by a high level of homologous recombination between homologous chromosomes. Recombination events are not distributed evenly within the genome, but vary both locally and at large scale. Locally, most recombination events are clustered in short intervals (a few kilobases) called hotspots, separated by large intervening regions with no or very little recombination. Despite the importance of regulating both the frequency and the distribution of recombination events, the genetic factors controlling the activity of the recombination hotspots in mammals are still poorly understood. We previously characterized a recombination hotspot located close to the Psmb9 gene in the mouse major histocompatibility complex by sperm typing, demonstrating that it is a site of recombination initiation. With the goal of uncovering some of the genetic factors controlling the activity of this initiation site, we analyzed this hotspot in both male and female germ lines and compared the level of recombination in different hybrid mice. We show that a haplotype-specific element acts at distance and in trans to activate about 2,000-fold the recombination activity at Psmb9. Another haplotype-specific element acts in cis to repress initiation of recombination, and we propose this control to be due to polymorphisms located within the initiation zone. In addition, we describe subtle variations in the frequency and distribution of recombination events related to strain and sex differences. These findings show that most regulations observed act at the level of initiation and provide the first analysis of the control of the activity of a meiotic recombination hotspot in the mouse genome that reveals the interactions of elements located both in and outside the hotspot.  相似文献   

20.
Methods for the analysis of chromatin immunoprecipitation sequencing (ChIP-seq) data start by aligning the short reads to a reference genome. While often successful, they are not appropriate for cases where a reference genome is not available. Here we develop methods for de novo analysis of ChIP-seq data. Our methods combine de novo assembly with statistical tests enabling motif discovery without the use of a reference genome. We validate the performance of our method using human and mouse data. Analysis of fly data indicates that our method outperforms alignment based methods that utilize closely related species.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0756-4) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号