首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

The periodical occurrence of dinucleotides with a period of 10.4 bases now is undeniably a hallmark of nucleosome positioning. Whereas many eukaryotic genomes contain visible and even strong signals for periodic distribution of dinucleotides, the human genome is rather featureless in this respect. The exact sequence features in the human genome that govern the nucleosome positioning remain largely unknown.

Results

When analyzing the human genome sequence with the positional autocorrelation method, we found that only the dinucleotide CG shows the 10.4 base periodicity, which is indicative of the presence of nucleosomes. There is a high occurrence of CG dinucleotides that are either 31 (10.4 × 3) or 62 (10.4 × 6) base pairs apart from one another - a sequence bias known to be characteristic of Alu-sequences. In a similar analysis with repetitive sequences removed, peaks of repeating CG motifs can be seen at positions 10, 21 and 31, the nearest integers of multiples of 10.4.

Conclusions

Although the CG dinucleotides are dominant, other elements of the standard nucleosome positioning pattern are present in the human genome as well. The positional autocorrelation analysis of the human genome demonstrates that the CG dinucleotide is, indeed, one visible element of the human nucleosome positioning pattern, which appears both in Alu sequences and in sequences without repeats. The dominant role that CG dinucleotides play in organizing human chromatin is to indicate the involvement of human nucleosomes in tuning the regulation of gene expression and chromatin structure, which is very likely due to cytosine-methylation/-demethylation in CG dinucleotides contained in the human nucleosomes. This is further confirmed by the positions of CG-periodical nucleosomes on Alu sequences. Alu repeats appear as monomers, dimers and trimers, harboring two to six nucleosomes in a run. Considering the exceptional role CG dinucleotides play in the nucleosome positioning, we hypothesize that Alu-nucleosomes, especially, those that form tightly positioned runs, could serve as "anchors" in organizing the chromatin in human cells.  相似文献   

2.

Background

Periodic spacing of A-tracts (short runs of A or T) with the DNA helical period of ~10?C11?bp is characteristic of intrinsically bent DNA. In eukaryotes, the DNA bending is related to chromatin structure and nucleosome positioning. However, the physiological role of strong sequence periodicity detected in many prokaryotic genomes is not clear.

Results

We developed measures of intensity and persistency of DNA curvature-related sequence periodicity and applied them to prokaryotic chromosomes and phages. The results indicate that strong periodic signals present in chromosomes are generally absent in phage genomes. Moreover, chromosomes containing prophages are less likely to possess a persistent periodic signal than chromosomes with no prophages.

Conclusions

Absence of DNA curvature-related sequence periodicity in phages could arise from constraints associated with DNA packaging in the viral capsid. Lack of prophages in chromosomes with persistent periodic signal suggests that the sequence periodicity and concomitant DNA curvature could play a role in protecting the chromosomes from integration of phage DNA.  相似文献   

3.

Background

The design of oligonucleotides and PCR primers for studying large genomes is complicated by the redundancy of sequences. The eukaryotic genomes are particularly difficult to study due to abundant repeats. The speed of most existing primer evaluation programs is not sufficient for large-scale experiments.

Results

In order to improve the efficiency and success rate of automatic primer/oligo design, we created a novel method which allows rapid masking of repeats in large sequence files, for example in eukaryotic genomes. It also allows the detection of all alternative binding sites of PCR primers and the prediction of PCR products. The new method was implemented in a collection of efficient programs, the GENOMEMASKER package. The performance of the programs was compared to other similar programs. We also modified the PRIMER3 program, to be able to design primers from lowercase-masked sequences.

Conclusion

The GENOMEMASKER package is able to mask the entire human genome for non-unique primers within 6 hours and find locations of all binding sites for 10 000 designed primer pairs within 10 minutes. Additionally, it predicts all alternative PCR products from large genomes for given primer pairs.  相似文献   

4.

Background

DNA replication initiates at distinct origins in eukaryotic genomes, but the genomic features that define these sites are not well understood.

Results

We have taken a combined experimental and bioinformatic approach to identify and characterize origins of replication in three distantly related fission yeasts: Schizosaccharomyces pombe, Schizosaccharomyces octosporus and Schizosaccharomyces japonicus. Using single-molecule deep sequencing to construct amplification-free high-resolution replication profiles, we located origins and identified sequence motifs that predict origin function. We then mapped nucleosome occupancy by deep sequencing of mononucleosomal DNA from the corresponding species, finding that origins tend to occupy nucleosome-depleted regions.

Conclusions

The sequences that specify origins are evolutionarily plastic, with low complexity nucleosome-excluding sequences functioning in S. pombe and S. octosporus, and binding sites for trans-acting nucleosome-excluding proteins functioning in S. japonicus. Furthermore, chromosome-scale variation in replication timing is conserved independently of origin location and via a mechanism distinct from known heterochromatic effects on origin function. These results are consistent with a model in which origins are simply the nucleosome-depleted regions of the genome with the highest affinity for the origin recognition complex. This approach provides a general strategy for understanding the mechanisms that define DNA replication origins in eukaryotes.  相似文献   

5.

Background

On porcine chromosome 7, the region surrounding the Major Histocompatibility Complex (MHC) contains several Quantitative Trait Loci (QTL) influencing many traits including growth, back fat thickness and carcass composition. Previous studies highlighted that a fragment of ~3.7 Mb is located within the Swine Leucocyte Antigen (SLA) complex. Internal rearrangements of this fragment were suggested, and partial contigs had been built, but further characterization of this region and identification of all human chromosomal fragments orthologous to this porcine fragment had to be carried out.

Results

A whole physical map of the region was constructed by integrating Radiation Hybrid (RH) mapping, BAC fingerprinting data of the INRA BAC library and anchoring BAC end sequences on the human genome. 17 genes and 2 reference microsatellites were ordered on the high resolution IMNpRH212000rad Radiation Hybrid panel. A 1000:1 framework map covering 550 cR12000 was established and a complete contig of the region was developed. New micro rearrangements were highlighted between the porcine and human genomes. A bovine RH map was also developed in this region by mapping 16 genes. Comparison of the organization of this region in pig, cattle, human, mouse, dog and chicken genomes revealed that 1) the translocation of the fragment described previously is observed only on the bovine and porcine genomes and 2) the new internal micro rearrangements are specific of the porcine genome.

Conclusion

We estimate that the region contains several rearrangements and covers 5.2 Mb of the porcine genome. The study of this complete BAC contig showed that human chromosomal fragments homologs of this heavily rearranged QTL region are all located in the region of HSA6 that surrounds the centromere. This work allows us to define a list of all candidate genes that could explain these QTL effects.  相似文献   

6.
7.
A clustering method for repeat analysis in DNA sequences   总被引:1,自引:0,他引:1  
Volfovsky N  Haas BJ  Salzberg SL 《Genome biology》2001,2(8):research0027.1-research002711

Background

A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.

Results

The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences.

Conclusions

We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.  相似文献   

8.
9.
The COG database: an updated version includes eukaryotes   总被引:4,自引:0,他引:4  

Background

The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.

Results

We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.

Conclusion

The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.  相似文献   

10.

Background

An organism’s DNA sequence is one of the key factors guiding the positioning of nucleosomes within a cell’s nucleus. Sequence-dependent bending anisotropy dictates how DNA is wrapped around a histone octamer. One of the best established sequence patterns consistent with this anisotropy is the periodic occurrence of AT-containing dinucleotides (WW) and GC-containing dinucleotides (SS) in the nucleosomal locations where DNA is bent in the minor and major grooves, respectively. Although this simple pattern has been observed in nucleosomes across eukaryotic genomes, its use for prediction of nucleosome positioning was not systematically tested.

Results

We present a simple computational model, termed the W/S scheme, implementing this pattern, without using any training data. This model accurately predicts the rotational positioning of nucleosomes both in vitro and in vivo, in yeast and human genomes. About 65 – 75% of the experimentally observed nucleosome positions are predicted with the precision of one to two base pairs. The program is freely available at http://people.rit.edu/fxcsbi/WS_scheme/. We also introduce a simple and efficient way to compare the performance of different models predicting the rotational positioning of nucleosomes.

Conclusions

This paper presents the W/S scheme to achieve accurate prediction of rotational positioning of nucleosomes, solely based on the sequence-dependent anisotropic bending of nucleosomal DNA. This method successfully captures DNA features critical for the rotational positioning of nucleosomes, and can be further improved by incorporating additional terms related to the translational positioning of nucleosomes in a species-specific manner.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-313) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

Halibuts are commercially important flatfish species confined to the North Pacific and North Atlantic Oceans. We have determined the complete mitochondrial genome sequences of four specimens each of Atlantic halibut (Hippoglossus hippoglossus), Pacific halibut (Hippoglossus stenolepis) and Greenland halibut (Reinhardtius hippoglossoides), and assessed the nucleotide variability within and between species.

Results

About 100 variable positions were identified within the four specimens in each halibut species, with the control regions as the most variable parts of the genomes (10 times that of the mitochondrial ribosomal DNA). Due to tandem repeat arrays, the control regions have unusually large sizes compared to most vertebrate mtDNAs. The arrays are highly heteroplasmic in size and consist mainly of different variants of a 61-bp motif. Halibut mitochondrial genomes lacking arrays were also detected.

Conclusion

The complexity, distribution, and biological role of the heteroplasmic tandem repeat arrays in halibut mitochondrial control regions are discussed. We conclude that the most plausible explanation for array maintenance includes both the slipped-strand mispairing and DNA recombination mechanisms.  相似文献   

12.
Evidence is provided that the nucleotide triplet con-sensus non-T(A/T)G (abbreviated to VWG) influences nucleosome positioning and nucleosome alignment into regular arrays. This triplet consensus has been recently found to exhibit a fairly strong 10 bp periodicity in human DNA, implicating it in anisotropic DNA bendability. It is demonstrated that the experimentally determined preferences for nucleosome positioning in native SV40 chromatin can, to a large extent, be pre-dicted simply by counting the occurrences of the period-10 VWG consensus. Nucleosomes tend to form in regions of the SV40 genome that contain high counts of period-10 VWG and/or avoid regions with low counts. In contrast, periodic occurrences of the dinucleotides AA/TT, implicated in the rotational positioning of DNA in nucleosomes, did not correlate with the preferred nucleosome locations in SV40 chromatin. Periodic occurrences of AA did correlate with preferred nucleosome locations in a region of SV40 DNA where VWG occurrences are low. Regular oscillations in period-10 VWG counts with a dinucleosome period were found in vertebrate DNA regions that aligned nucleosomes into regular arrays in vitro in the presence of linker histone. Escherichia coli and plasmid DNA, which fail to align nucleosomes in vitro, lacked these regular VWG oscillations.  相似文献   

13.
14.

Background

An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes.

Results

Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archae, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archae and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 ± 8% whereas the CG detected 73 ± 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca.

Conclusion

Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.  相似文献   

15.
16.

Background

Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations.

Results

We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific.

Conclusions

The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift.

Reviewers

Eugene Koonin, I. King Jordan, Jürgen Brosius.  相似文献   

17.
18.
19.

Background

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings.

Results

Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition.

Conclusion

The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号