首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基因组序列k-mer的非随机使用规律及包含的生物学意义一直是人们关注的问题,目前还没有根本性进展。本文以七个物种的全部基因序列为样本,得到各物种基因组序列的8-mer频谱分布。发现狗和牛的频谱有三个峰,而斑马鱼、青鳉鱼、秀丽线虫和酿酒酵母的频谱只有一个峰,鸡的频谱分布形状介于两者之间。将8-mer集合按照XY二核苷含量分类,结果显示只有CG二核苷分类下0CG、1CG和2CG三类子集的频谱形成各自独立的单峰分布。对照随机序列,发现0CG模体是随机进化的,1CG和2CG模体是定向进化的,它们的使用频次远小于随机频次,且这种独立进化分离规律具有物种普适性。三个CG子集频谱之间的距离是产生单峰或多峰现象的根本原因。将七个物种基因组序列标准化到109bp,比较发现1CG和2CG子集频谱与物种进化显著相关,0CG子集频谱与物种进化无显著关系。可以认为三种CG模体各自执行着不同的生物学功能。基因组序列8-mer的独立分离规律为揭示基因组结构、基因组进化以及模体的生物功能提供了一种新的思维方式。  相似文献   

2.
3.
Xing Y  Zhao X  Cai L 《Genomics》2011,98(5):359-366
Knowledge of the detailed organization of nucleosomes across genomes and the mechanisms of nucleosome positioning is critical for the understanding of gene regulation and expression. In the present work, the bias of 4-mer frequency in nucleosome and linker sequences of the S. cerevisiae genome was analyzed statistically. A novel position-correlation scoring function algorithm based on the bias of 4-mer frequency in linker sequences was presented to distinguish nucleosome vs linker sequences. Five-fold cross-validation demonstrated that the algorithm achieved a good performance with mean area under the receiver operator characteristics curve of 0.981. Next, the algorithm was used to predict nucleosome occupancy throughout the S. cerevisiae genome and relatively high correlation coefficients with experiment maps of nucleosome positioning were obtained. Besides, the distinct nucleosome depleted regions in the vicinity of regulatory sites were confirmed. The results suggest that intrinsic DNA sequence preferences in linker regions have a significant impact on the nucleosome occupancy.  相似文献   

4.
In eukaryotic cells, DNA has to bend significantly to pack inside the nucleus. Physical properties of DNA such as bending flexibility and curvature are expected to affect DNA packaging and partially determine the nucleosome positioning patterns inside a cell. DNA CpG methylation, the most common epigenetic modification found in DNA, is known to affect the physical properties of DNA. However, its detailed role in nucleosome formation is less well‐established. In this study, we evaluated the effect of defined CpG patterns (unmethylated and methylated) on DNA structure and their respective nucleosome‐forming ability. Our results suggest that the addition of CpG dinucleotides, either as a (CG)n stretch or (CGX8)n repeats at 10 bp intervals, lead to reduced hydrodynamic radius and decreased nucleosome‐forming ability of DNA. This effect is more predominant for a DNA stretch ((CG)5) located in the middle of a DNA fragment. Methylation of CpG sites, surprisingly, seems to reduce the difference in DNA structure and nucleosome‐forming ability among DNA constructs with different CpG patterns. Our results suggest that unmethylated and methylated CpG patterns can play very different roles in regulating the physical properties of DNA. CpG methylation seems to reduce the DNA conformational variations affiliated with defined CpG patterns. Our results can have significant bearings in understanding the nucleosome positioning pattern in living organisms modulated by DNA sequences and epigenetic features. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 517–524, 2014.  相似文献   

5.

Background  

The relative preference of nucleosomes to form on individual DNA sequences plays a major role in genome packaging. A wide variety of DNA sequence features are believed to influence nucleosome formation, including periodic dinucleotide signals, poly-A stretches and other short motifs, and sequence properties that influence DNA structure, including base content. It was recently shown by Kaplan et al. that a probabilistic model using composition of all 5-mers within a nucleosome-sized tiling window accurately predicts intrinsic nucleosome occupancy across an entire genome in vitro. However, the model is complicated, and it is not clear which specific DNA sequence properties are most important for intrinsic nucleosome-forming preferences.  相似文献   

6.
Alu sequences carry periodical pattern with CG dinucleotides (CpG) repeating every 31-32 bases. Similar distances are observed in distribution of DNA curvature in crystallized nucleosomes, at positions +/-1.5 and +/-4.5 periods of DNA from nucleosome DNA dyad. Since CG elements are also found to impart to nucleosomes higher stability when positioned at +/-1.5 sites, it suggests that CG dinucleotides may play a role in modulation of the nucleosome strength when the CG elements are methylated. Thus, Alu sequences may harbor special epigenetic nucleosomes with methylation-dependent regulatory functions. Nucleosome DNA sequence probe is suggested to detect locations of such regulatory nucleosomes in the sequences.  相似文献   

7.
Akan P  Deloukas P 《Gene》2008,410(1):165-176
  相似文献   

8.

Background

The periodical occurrence of dinucleotides with a period of 10.4 bases now is undeniably a hallmark of nucleosome positioning. Whereas many eukaryotic genomes contain visible and even strong signals for periodic distribution of dinucleotides, the human genome is rather featureless in this respect. The exact sequence features in the human genome that govern the nucleosome positioning remain largely unknown.

Results

When analyzing the human genome sequence with the positional autocorrelation method, we found that only the dinucleotide CG shows the 10.4 base periodicity, which is indicative of the presence of nucleosomes. There is a high occurrence of CG dinucleotides that are either 31 (10.4 × 3) or 62 (10.4 × 6) base pairs apart from one another - a sequence bias known to be characteristic of Alu-sequences. In a similar analysis with repetitive sequences removed, peaks of repeating CG motifs can be seen at positions 10, 21 and 31, the nearest integers of multiples of 10.4.

Conclusions

Although the CG dinucleotides are dominant, other elements of the standard nucleosome positioning pattern are present in the human genome as well. The positional autocorrelation analysis of the human genome demonstrates that the CG dinucleotide is, indeed, one visible element of the human nucleosome positioning pattern, which appears both in Alu sequences and in sequences without repeats. The dominant role that CG dinucleotides play in organizing human chromatin is to indicate the involvement of human nucleosomes in tuning the regulation of gene expression and chromatin structure, which is very likely due to cytosine-methylation/-demethylation in CG dinucleotides contained in the human nucleosomes. This is further confirmed by the positions of CG-periodical nucleosomes on Alu sequences. Alu repeats appear as monomers, dimers and trimers, harboring two to six nucleosomes in a run. Considering the exceptional role CG dinucleotides play in the nucleosome positioning, we hypothesize that Alu-nucleosomes, especially, those that form tightly positioned runs, could serve as "anchors" in organizing the chromatin in human cells.  相似文献   

9.

Background

The studies on CpG islands (CGI) and Alu elements functions, evolution, and distribution in the genome started since the discovery in nineteen eighties (1981, 1986, correspondingly). Their highly skewed genome wide distribution implies the non-random retrotransposition pattern. Besides CGIs in gene promoters, CGIs clusters were observed in the homeobox gene regions and in the macrosatellites, but the whole picture of their distribution specifics was not grasped. Attempts to identify any causative features upon their (genome wide) distribution, such as the DNA context mediated preferred insertion sites of Alu repeats, have been made to ascribe their clusters location.

Methods

Recent emergence of high resolution 3D map of human genome allowed segregating the genome into the large scale chromatin domains of naturally observable nuclear subcompartments, or Topologically Associated Domains (TADs), designated by spatial chromatin distribution. We utilized the chromatin map to elucidate relations between large scale chromatin state and CpG rich elements landscape.In the course of analysis it was confirmed that genes, Alu and CGI clusters maintain obvious, albeit different in strength, preference for open chromatin. For the first time it was clearly shown that the clusters density of the Alu and CGIs monotonically depend on the chromatin accessibility rate. In particular, the highest density of these elements is found in A1 euchromatin regions characterized by a high density of small length genes replicating in the early S-phase. It implies that these elements mediate (CGIs) or are a side element (Alus) of chromatin accessibility.

Results

We elucidated that both methylated and non-methylated CGIs display the affinity to chromatin accessibility. As a part of comparative genomics section, we elucidated that the dog’s genome non-canonical structure, outstanding in mammals for its high CGIs abundance compared to gene number, is explained by the presence of dense tandem CGI extended hotspots (500 kb on average) in subtelomeric and pericentromeric regions with highly skewed CG content, and not by CGIs global distribution pattern shift.

Conclusions

The study underlines the close association of CG-rich elements distribution with the newly introduced large scale chromatin state map, proposing a refined standpoint on interrelation of aforementioned genome elements and the chromatin state. To our expertise, the TAD-associated partition model employed in the study is likely the most substantial one regarding CpG rich clusters distribution among the whole genome chromatin/isochores maps available.
  相似文献   

10.
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.  相似文献   

11.
In human recurrent cutaneous herpes simplex, there is a sequential infiltrate of CD4 and then CD8 lymphocytes into lesions. CD4 lymphocytes are the major producers of the key cytokine IFN-gamma in lesions. They recognize mainly structural proteins and especially glycoproteins D and B (gD and gB) when restimulated in vitro. Recent human vaccine trials using recombinant gD showed partial protection of HSV seronegative women against genital herpes disease and also, in placebo recipients, showed protection by prior HSV1 infection. In this study, we have defined immunodominant peptide epitopes recognized by 8 HSV1(+) and/or 16 HSV2(+) patients using (51)Cr-release cytotoxicity and IFN-gamma ELISPOT assays. Using a set of 39 overlapping 20-mer peptides, more than six immunodominant epitopes were defined in gD2 (two to six peptide epitopes were recognized for each subject). Further fine mapping of these responses for 4 of the 20-mers, using a panel of 9 internal 12-mers for each 20-mers, combined with MHC II typing and also direct in vitro binding assay of these peptides to individual DR molecules, showed more than one epitope per 20-mers and promiscuous binding of individual 20-mers and 12-mers to multiple DR types. All four 20-mer peptides were cross-recognized by both HSV1(+)/HSV2(-) and HSV1(-)/HSV2(+) subjects, but the sites of recognition differed within the 20-mers where their sequences were divergent. This work provides a basis for CD4 lymphocyte cross-recognition of gD2 and possibly cross-protection observed in previous clinical studies and in vaccine trials.  相似文献   

12.
We propose a tetrahedral Gray code that facilitates visualization of genome information on the surfaces of a tetrahedron, where the relative abundance of each -mer in the genomic sequence is represented by a color of the corresponding cell of a triangular lattice. For biological significance, the code is designed such that the -mers corresponding to any adjacent pair of cells differ from each other by only one nucleotide. We present a simple procedure to draw such a pattern on the development surfaces of a tetrahedron. The thus constructed tetrahedral Gray code can demonstrate evolutionary conservation and variation of the genome information of many organisms at a glance. We also apply the tetrahedral Gray code to the honey bee (Apis mellifera) genome to analyze its methylation structure. The results indicate that the honey bee genome exhibits CpG overrepresentation in spite of its methylation ability and that two conserved motifs, CTCGAG and CGCGCG, in the unmethylated regions are responsible for the overrepresentation of CpG.  相似文献   

13.
14.
15.
Yu  Ning  Guo  Xuan  Zelikovsky  Alexander  Pan  Yi 《BMC genomics》2017,18(4):392-9

Background

As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs /CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden.

Results

A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection.

Conclusions

Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.
  相似文献   

16.
To gain deeper insights into principles of cell biology, it is essential to understand how cells reorganize their genomes by chromatin remodeling. We analyzed chromatin remodeling on next generation sequencing data from resting and activated T cells to determine a whole-genome chromatin remodeling landscape. We consider chromatin remodeling in terms of nucleosome repositioning which can be observed most robustly in long nucleosome-free regions (LNFRs) that are occupied by nucleosomes in another cell state. We found that LNFR sequences are either AT-rich or GC-rich, where nucleosome repositioning was observed much more prominently in GC-rich LNFRs — a considerable proportion of them outside promoter regions. Using support vector machines with string kernels, we identified a GC-rich DNA sequence pattern indicating loci of nucleosome repositioning in resting T cells. This pattern appears to be also typical for CpG islands. We found out that nucleosome repositioning in GC-rich LNFRs is indeed associated with CpG islands and with binding sites of the CpG-island-binding ZF-CXXC proteins KDM2A and CFP1. That this association occurs prominently inside and also prominently outside of promoter regions hints at a mechanism governing nucleosome repositioning that acts on a whole-genome scale.  相似文献   

17.
18.
The present work describes three novel nonpolar host peptide sequences that provide a ready assessment of the 310- and α-helix compatibilities of natural and unnatural amino acids at different positions of small- to medium-size peptides. The unpolar peptides containing Ala, Aib, and a C-terminal p-iodoanilide group were designed in such a way that the peptides could be rapidly assembled in a modular fashion, were highly soluble in solvent mixtures of triflouroethanol and H2O for CD- and two-dimensional (2D) nmr spectroscopic analyses, and showed excellent crystallinity suited for x-ray structure analysis. To validate our approach we synthesized 9-mer peptides 79a–96 (Table IV), 12-mer peptides 99–110c (Table V), and 10-mer peptides 120a–125d and 129–133 (Table VI and Scheme 8) incorporating a series of optically pure cyclic and open-chain (R)- and (S)-α,α-disubstituted glycines 1–10 (Figure 2). These amino acids are known to significantly modulate the conformations of small peptides. Based on x-ray structures of 9-mers 79a, 80, and 87 (Figures 4–7), 10-mers 124c, 131, and 132 (Figures 9–12), and 12-mer peptide 102b (Figure 13), CD spectra of all peptides recorded in acidic, neutral, and basic media and detailed 2D-nmr analyses of 9-mer peptide 86 and 12-mer 102b, several interesting conformational observations were made. Especially interesting results were obtained using the convex constraint CD analysis proposed by Fasman on 9-mer peptides 79a–d, 80, 81, 86, and 87, which allowed us to determine the relative content of 310- and α-helical conformations. These results were fully supported by the corresponding x-ray and 2D-nmr analyses. As a striking example we found that the (S)- and (R)-β-tetralin derived amino acids (R)- and (S)-1 show excellent α-helix stabilisation, more pronounced than Aib and Ala. These novel reference peptide sequences should help establish a scale for natural and unnatural amino acids concerning their intrinsic 310- and α-helix compatibilities at different positions of medium-sized peptides and thus improve our understanding in the folding processes of peptides. © 1997 John Wiley & Sons, Inc. Biopoly 42: 575–626, 1997  相似文献   

19.
20.
When mitoxantrone is activated by formaldehyde it can form adducts with DNA. These occur preferentially at CpG and CpA sequences and are enhanced 2-3-fold at methylated CpG sequences compared with non-methylated sites. We sought to understand the molecular factors involved in enhanced adduct formation at these methylated sites. This required, first, clarification of factors that contributed to the formation of adducts at CpG sites. For this purpose mass spectrometry of an oligonucleotide duplex (containing a single CpG adduct site) was used to confirm the presence of an additional carbon atom (derived from formaldehyde) on the drug-DNA complex. The effect of 3'-flanking sequences was revealed by electrophoretic analysis of oligonucleotide-drug adducts, and the preferred adduct-forming site was identified as 5'-CGG-3'. Radiolabeled studies of drug-DNA adducts confirmed that the site of attachment involved the exocyclic amino of guanine. Molecular modeling analysis of the relative stability of the intercalated form of mitoxantrone was consistent with observed adduct-forming potential of CG sites with varying flanking sequences. The known preference for adduct formation at methylated CG sites was confirmed by energetics calculations and shown to be due to a shift of equilibrium of the intercalated form of the drug from the major groove (at CG sites) to the minor groove (at methylated CG sites). This increases the relative amount of drug that is located adjacent to the N-2 exocyclic amino of guanine in the minor groove, where covalent linkage is facilitated. These results account for the enhanced covalent binding of mitoxantrone to methylated CG sequences and provide a molecular model of the interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号