首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.  相似文献   

2.
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments (>300 kb on average) relatively homogeneous in G+C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G+C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G+C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available.  相似文献   

3.
Incorporated with the Z curve method, the technique of wavelet multiresolution (also known as multiscale) analysis has been proposed to identify the boundaries of isochores in the human genome. The human MHC sequence and the longest contigs of human chromosomes 21 and 22 are used as examples. The boundary between the isochores of Class III and Class II in the MHC sequence has been detected and found to be situated at the position 2,490,368bp. This result is in good agreement with the experimental evidence. An isochore with a length of about 7Mb in chromosome 21 has been identified and found to be gene- and Alu-poor. We have also found that the G+C content of chromosome 21 is more homogeneous than that of chromosome 22. Compared with the window-based methods, the present method has the highest resolution for identifying the boundaries of isochores, even at a scale of single base. Compared with the entropic segmentation method, the present method has the merits of more intuitiveness and less calculations. The important conclusion drawn in this study is that the segmentation points, at which the G+C content undergoes relatively dramatic changes, do exist in the human genome. These 'singularity' points may be considered to be candidates of isochore boundaries in the human genome. The method presented is a general one and can be used to analyze any other genomes.  相似文献   

4.
The human Y chromosome contains a group of repeated DNA elements, identified as 3.4-kilobase pair (kb) fragments in Hae III digests of male genomic DNA, which contain both Y-specific and non-Y-specific sequences. We have used these 3.4-kb Hae III Y fragments to explore the organizational properties and chromosomal distribution of the autosomal homologs of the non-Y-specific (NYS) 3.4-kb Hae III Y elements. Three distinct organizations, termed domains, have been identified and shown to have major concentrations on separate chromosomes. We have established that domain K is located on chromosome 15 and domain D on chromosome 16 and suggested that domain R is on chromosome 1. Our findings suggest that each domain is composed of a tandemly arrayed cluster of a regularly repeating unit containing two sets of repeated sequences: one that is homologous to the NYS 3.4-kb Hae III Y sequences and one that does not cross-react with the 3.4-kb Hae III Y repeats. Thus, these autosomal repeated DNA domains, like their Y chromosome counterparts, consist of a complex mixture of repeated DNA elements interspersed among each other in ways that lead to defined periodicities. Although each of the three identified autosomal domains cross-reacts with 3.4-kb Hae III Y fragments purified from genomic DNA, the length periodicities and sequence content of the autosomal domains are chromosome specific. The organizational properties and chromosomal distribution of these NYS 3.4-kb Hae III homologs seem inconsistent with stochastic mechanisms of sequence diffusion between chromosomes.  相似文献   

5.
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.  相似文献   

6.
SEGMENT: identifying compositional domains in DNA sequences   总被引:2,自引:0,他引:2  
MOTIVATION: DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed. RESULTS: Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.  相似文献   

7.
Alphoid and satellite III sequences are arranged as large tandem arrays in the centromeric regions of human chromosomes. Several recent studies using in situ hybridisation to investigate the relative positions of these sequences have shown that they occupy adjacent but non-overlapping domains in metaphase chromosomes. We have analysed the DNA sequence at the junction between alphoid and satellite III sequences in a cosmid previously mapped to chromosome 10. The alphoid sequence consists of tandemly arranged dimers which are distinct from the known chromosome 10-specific alphoid family. Polymerase chain reaction experiments confirm the integrity of the sequence data. These results, together with pulsed field gel electrophoresis data place the boundary between alphoid and satellite III sequences in the mapping interval 10 centromere-10q11.2. The sequence data shows that these repetitive sequences are separated by a partial L1 interspersed repeat sequence less than 500bp in length. The arrangement of the junction suggests that a recombination event has brought these sequences into close proximity.  相似文献   

8.
The construction of a yeast artificial chromosome containing a human DNA insert is reported. This molecule of about 200 kb behaves as a native yeast chromosome since it has a very high mitotic stability and is present in the yeast transformant clone at a copy number similar to that of the resident chromosomes. Hybridization with the TTAGGG sequence demonstrates that this chromosome contains human telomeric sequences. In situ hybridization of the biotin-labelled artificial chromosome to metaphase human chromosomes shows that the insert occupies a telomeric position on the long arm of chromosome 9. Since the fragment was cloned as an EcoRI insert and not as a telomere, it is situated medially to the telomeric sequences and harbours telomere-associated sequences, that have been shown to contain the TTAGGG sequence. The fragment represents the end of the genetic map of chromosome 9 and thus can be used to characterize the sequence and the structure of the chromosomal region that runs from the end of the chromosome to the first gene.  相似文献   

9.
Human Satellite III DNA is a major tandem repeat in the human genome and presents a TaqI-specific hypervariable restriction fragment length polymorphism when a Satellite III related sequence (228S) is used as a probe. In situ examination shows this sequence to be near specific for the region 9qh on chromosome 9 when it is used at low probe concentrations. However the region 9qh does not appear to be the only or even the primary source of the TaqI-deficient polymorphic sequences (TDPS). Rather, such sequences appear to be mostly present in chromosomes 20, 21, and 22, and these represent the largest regions of homogeneous Satellite III in the genome; they are also resistant to digestion with a range of other restriction endonucleases. The TDPS do not arise from either of the two currently recognized Satellite III-enriched genomic regions, namely autosomal K-domains, which form part of 15p in chromosome 15 or the heterochromatin of chromosome Y.  相似文献   

10.
Segmentation of yeast DNA using hidden Markov models   总被引:2,自引:0,他引:2  
  相似文献   

11.
An isochore map of the human genome based on the Z curve method   总被引:4,自引:0,他引:4  
Zhang CT  Zhang R 《Gene》2003,317(1-2):127-135
The distribution of the G+C content in the human genome has been studied by using a windowless technique derived from the Z curve method. The most important findings presented in this paper are twofold. First, abrupt variations of the G+C content along human chromosome sequences are the main variation patterns of G+C content. It is found that at some sites, the G+C content undergoes abrupt changes from a G+C-rich region to a G+C-poor region alternatively and vice versa. Second, it is shown that long domains with relatively homogeneous G+C content along each chromosome do exist. These domains are thought to be isochores, which usually have sharp boundaries. Consequently, 56 isochores longer than 3 Mb have been identified in chromosomes 1-22, X and Y. Boundaries, size and G+C content of each isochore identified are listed in detail. As an example to demonstrate the power of the method, the boundary between the Classes III and II isochores of the MHC sequence has been determined and found to be at 2,477,936, which is in good agreement with the experimental evidence. A homogeneity index is introduced to measure the homogeneity of G+C content in isochores. We emphasize that the homogeneity of G+C content is relative. The isochores in which the G+C content keeps absolutely constant do not exist. Isochore structures appear to be a basic organization of the human genome. Due to the relevance to many important biological functions, the clarification of isochore structures will provide much insight into the understanding of the human genome.  相似文献   

12.
MOSAIC is a set of tools for the segmentation of multiple aligned DNA sequences into homogeneous zones. The segmentation is based on the distribution of mutational events along the alignment. As an example, the analysis of one repeated sequence belonging to the subtelomeric regions of the yeast genome is presented. AVAILABILITY: Free access from ftp://ftp.biomath.jussieu.fr/pub/papers/MOSAIC  相似文献   

13.
Evolution of alpha-satellite DNA on human acrocentric chromosomes   总被引:10,自引:0,他引:10  
K H Choo  B Vissel  E Earle 《Genomics》1989,5(2):332-344
In situ hybridization of five new and one previously described alpha-satellite sequences isolated from chromosome 21 libraries gave the following chromosomal distribution patterns: (a) two sequences (pTRA-1 and -4) hybridizing to chromosomes 13, 14, 15, 21, and 22 (also 19 and 20); (b) one sequence (pTRA-7) hybridizing to chromosome 14; and (c) three sequences (pTRA-2, -11 and -15) hybridizing to chromosomes 13, 14, and 21, with significant but weaker signals on 15 and 22. These results suggested the sharing of alphoid domains between different acrocentric chromosomes and the coexistence of multiple domains on each chromosome. Analysis of somatic cell hybrids carrying a single human acrocentric chromosome using pTRA-2 demonstrated a higher-order repeating structure common to chromosomes 13, 14, and 21, but not to 15 and 22, providing direct evidence for sequence homogenization in this domain among the former three chromosomes. We present a model of evolution and genetic exchange of alpha sequences on the acrocentric chromosomes which can satisfactorily explain these and previous observations of (a) two different alphoid subfamilies, one common to chromosomes 13 and 21 and the other common to chromosomes 14 and 22, (b) a different alphoid subfamily on chromosome 22, and (c) nonrandom participation of chromosomes 13 and 14, and 14 and 21 in Robertsonian translocations.  相似文献   

14.
ARS replication during the yeast S phase   总被引:43,自引:0,他引:43  
A 1.45 kb circular plasmid derived from yeast chromosome IV contains the autonomous replication element called ARS1. Isotope density transfer experiments show that each plasmid molecule replicates once each S phase, with initiation depending on two genetically defined steps required for nuclear DNA replication. A density transfer experiment with synchronized cells demonstrates that the ARS1 plasmid population replicates early in the S phase. The sequences adjacent to ARS1 on chromosome IV also initiate replication early, suggesting that the ARS1 plasmid contains information which determines its time of replication. The times of replication for two other yeast chromosome sequences, ARS2 and a sequence referred to as 1OZ, indicate that the temporal order of replication is ARS1 leads to ARS2 leads to 1OZ. These experiments show directly that specific chromosome regions replicate at specific times during the yeast S phase. If ARS elements are origins of chromosome replication, then the experiment reveals times of activation for two origins.  相似文献   

15.
We have isolated and sequenced a yeast gene encoding a protein (Mr 24,875) very rich in serine (SRP) and alanine residues that accounted for 25% and 20% of the total amino acids, respectively. The SRP1 gene is highly expressed in culture conditions leading to glucose repression (Marguet & Lauquin, 1986), the amount of SRP1 mRNA representing about 1 to 2% of total poly(A)+ RNA. A repetitive structure of eight direct tandem repeats 36-base long, also reflected in the amino acid sequence, was found in the second half of the open reading frame. The consensus amino acid sequence of the repeat was Ser-Ser-Ser-Ala-Ala-Pro-Ser-Ser-Ser-Glu-Ala-Lys. Replacing the genomic copy of the cloned gene with a disrupted SRP1 gene indicated that the SRP1 gene was not essential for viability in yeast, but several SRP1-homologous sequences were found within the yeast genome, raising the possibility that the disrupted SRP1 gene is rescued by one of the other SRP-homologous sequences. Complete separation of yeast chromosomes by contour-clamped homogeneous field electrophoresis indicated that, apart from chromosome V, which carries the SRP1 gene, 12 chromosomes have SRP-related sequences with various degrees of homology. These sequences were located on chromosomes XV, VII and XI under stringent conditions of hybridization (tm -20 degrees C), and observed on chromosomes I, II, III, IV, VI, VIII, X, XI and XII, only under low-stringency conditions (tm -40 degrees C). Northern blot analysis of both the wild type and SRP1-disrupted strains indicated that along with SRP1 at least one more member of the SRP family was transcribed to a 0.7 kb (1 kb = 10(3) bases) polyadenylated RNA species clearly distinct from the SRP1-specific mRNA (1 kb long). Analyses of the SRP1 repeat domain suggested a model for the divergent evolution of the repeats in the SRP1 sequence.  相似文献   

16.
Proteolipid protein (PLP) was isolated from white matter of human brain by chloroform/methanol extraction and further purified by chromatography. Performic acid oxidation yielded a product homogeneous in NaDodSO4-polyacrylamide electrophoresis with a molecular mass of 30 kDa. The carboxymethylated PLP was chemically cleaved with cyanogen bromide into four fragments: CNBr I 22-24 kDa, CNBr II 5 kDa, CNBr III 1.4 kDa and CNBr IV 0.7 kDa. HBr/dimethylsulfoxide cleavage at tryptophan residues released four fragments: Trp I 14-16 kDa, Trp II 2.0 kDa, Trp III 5 kDa and Trp IV 7 kDa. Hydrophilic fragments were enriched in 50% formic acid (CNBr II, III, IV and Trp II and III), whereas hydrophobic peptides precipitated from this solvent were CNBr I, Trp I and IV. The fragments were separated by gel filtration with 90% formic acid as solvent and finally purified by gel permeation HPLC (Si 60 and Si 100) for automated liquid and solid-phase Edman degradation. Large fragments were further cleaved with different proteinases (trypsin, V8-proteinase, endoproteinase Lys-C and thermolysin). We used an improved strategy in the sequencing of the human proteolipid protein compared with our approach to the structural elucidation of bovine brain PLP. The amino-acid sequence of human PLP contains 276 residues, the same as found in bovine proteolipid protein. The two sequences proved to be identical. The possible importance of the conservative structure of this integral membrane protein is discussed.  相似文献   

17.
Gao F  Zhang CT 《The FEBS journal》2006,273(8):1637-1648
The availability of the complete chicken genome sequence provides an unprecedented opportunity to study the global genome organization at the sequence level. Delineating compositionally homogeneous G + C domains in DNA sequences can provide much insight into the understanding of the organization and biological functions of the chicken genome. A new segmentation algorithm, which is simple and fast, has been proposed to partition a given genome or DNA sequence into compositionally distinct domains. By applying the new segmentation algorithm to the draft chicken genome sequence, the mosaic organization of the chicken genome can be confirmed at the sequence level. It is shown herein that the chicken genome is also characterized by a mosaic structure of isochores, long DNA segments that are fairly homogeneous in the G + C content. Consequently, 25 isochores longer than 2 Mb (megabases) have been identified in the chicken genome. These isochores have a fairly homogeneous G + C content and often correspond to meaningful biological units. With the aid of the technique of cumulative GC profile, we proposed an intuitive picture to display the distribution of segmentation points. The relationships between G + C content and the distributions of genes (CpG islands, and other genomic elements) were analyzed in a perceivable manner. The cumulative GC profile, equipped with the new segmentation algorithm, would be an appropriate starting point for analyzing the isochore structures of higher eukaryotic genomes.  相似文献   

18.
Tyrosinase is the major enzyme responsible for the formation of melanin pigment and is found throughout the animal kingdom. In humans, the tyrosinase gene (TYR) maps to the long arm of chromosome 11 at band q14→q21, while a tyrosinase related gene (TYRL) maps to the short arm of chromosome 11 at pll.2°Cen. We and others have found that the TYRL locus contains sequences that are similar to exons IV and V of the authentic tyrosinase gene but lacks sequences of exons I, II, and III. In an attempt to understand the evolution of the human tyrosinase gene, we have analyzed TYR and TYRL in primates and have found that exons IV and V of the chimpanzee and gorilla TYR are very similar to the human, with the gorilla sequence being more similar than the chimpanzee. We have also found that the gorilla but not the chimpanzee contains a TYRL locus similar to the human TYRL locus.  相似文献   

19.
The region of Saccharomyces cerevisiae chromosome III centromere-distal to the PGK gene is the site of frequent chromosome polymorphisms. We have sequenced this region from fragments of chromosome III isolated from three different yeast strains, GRF88, CN31C and CF4-16B. The sequence analysis demonstrates that these polymorphisms are associated with the presence of Ty and delta elements and defines a region of the chromosome which is a hot-spot for transposition events (the RAHS). The three strains can be arranged into a logical evolutionary series in which successive transposition and recombination events insert Ty elements and fuse them with consequent deletions of chromosome and of transposon sequences. The influence of such events on yeast genome evolution is discussed.  相似文献   

20.
The proteins from the ZIP and the CDF families of zinc transporters contain a histidine-rich sequence in a loop domain located between transmembrane domains III and IV for the ZIP family and transmembrane domains IV and V for the CDF family. Topological predictions suggest that these loops are located in the cytoplasm. The loops contain a histidine-rich sequence with a variable number of histidine residues depending on the transporter. The histidine-rich sequence was postulated to serve as an extra-membrane metal binding site in these proteins. hZip1 is a human zinc transporter ubiquitously expressed. The histidine-rich motif located in the large loop of this transporter is composed of the following sequence, H(158)WHD(161). To determine if this motif is involved in the zinc transport activity of the protein, we performed site directed-mutagenesis to replace the loop histidines with alanines. Results suggest that both histidines are necessary for the zinc transport function and are not involved in the plasma membrane localization of the transporter as has been reported for the Zrt1 transporter in yeast. In addition, two histidine residues in transmembrane domains IV and V are also important in the zinc transport function. The results support an intermolecular exchange mechanism of zinc transport.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号