首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJalpha distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJalpha motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJalpha distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp Long monomer in chromosome 5 (divergence less then 0.3%).  相似文献   

2.
MOTIVATION: GenBank data are at present lacking alpha satellite higher-order repeat (HOR) annotation. Furthermore, exact HOR consensus lengths have not been reported so far. Given the fast growth of sequence databases in the centromeric region, it is of increasing interest to have efficient tools for computational identification and analysis of HORs from known sequences. RESULTS: We develop a graphical user interface method, ColorHOR, for fast computational identification of HORs in a given genomic sequence, without requiring a priori information on the composition of the genomic sequence. ColorHOR is based on an extension of the key-string algorithm and provides a color representation of the order and orientation of HORs. For the key string, we use a robust 6 bp string from a consensus alpha satellite and its representative nature is tested. ColorHOR algorithm provides a direct visual identification of HORs (direct and/or reverse complement). In more detail, we first illustrate the ColorHOR results for human chromosome 1. Using ColorHOR we determine for the first time the HOR annotation of the GenBank sequence of the whole human genome. In addition to some HORs, corresponding to those determined previously biochemically, we find new HORs in chromosomes 4, 8, 9, 10, 11 and 19. For the first time, we determine exact consensus lengths of HORs in 10 chromosomes. We propose that the HOR assignment obtained by using ColorHOR be included into the GenBank database.  相似文献   

3.
We have investigated the organization and complexity of alpha satellite DNA on chromosomes 10 and 12 by restriction endonuclease mapping, in situ hybridization (ISH), and DNA-sequencing methods. Alpha satellite DNA on both chromosomes displays a basic dimeric organization, revealed as a 6- and an 8-mer higher-order repeat (HOR) unit on chromosome 10 and as an 8-mer HOR on chromosome 12. While these HORs show complete chromosome specificity under high-stringency ISH conditions, they recognize an identical set of chromosomes under lower stringencies. At the nucleotide sequence level, both chromosome 10 HORs are 50% identical to the HOR on chromosome 12 and to all other alpha satellite DNA sequences from the in situ cross-hybridizing chromosomes, with the exception of chromosome 6. An 80% identity between chromosome 6- and chromosome 10-derived alphoid sequences was observed. These data suggest that the alphoid DNA on chromosomes 6 and 10 may represent a distinct subclass of the dimeric subfamily. These sequences are proposed to be present, along with the more typical dimeric alpha satellite sequences, on a number of different human chromosomes.  相似文献   

4.
The structure of the alpha satellite DNA higher-order repeat (HOR) unit from a subset shared by human chromosomes 13 and 21 (D13Z1 and D21Z1) has been examined in detail. By using a panel of hybrids possessing either a chromosome 13 or a chromosome 21, different HOR unit genotypes on chromosomes 13 and 21 have been distinguished. We have also determined the basis for a variant HOR unit structure found on 8% of chromosomes 13 but not at all on chromosomes 21. Genomic restriction maps of the HOR units found on the two chromosome 13 genotypes and on the chromosome 21 genotype are constructed and compared. The nucleotide sequence of a predominant 1.9-kilobasepair HOR unit from the D13Z1/D21Z1 subset has been determined. The DNA sequences of different alpha satellite monomers comprising the HOR are compared, and the data are used to develop a model, based on unequal crossing-over, for the evolution of the current HOR unit found at the centromeres of both these chromosomes.Correspondence to: H.F. Willard  相似文献   

5.
The centromeric regions of all human chromosomes are characterized by distinct subsets of a diverse tandemly repeated DNA family, alpha satellite. On human chromosome 17, the predominant form of alpha satellite is a 2.7-kilobase-pair higher-order repeat unit consisting of 16 alphoid monomers. We present the complete nucleotide sequence of the 16-monomer repeat, which is present in 500 to 1,000 copies per chromosome 17, as well as that of a less abundant 15-monomer repeat, also from chromosome 17. These repeat units were approximately 98% identical in sequence, differing by the exclusion of precisely 1 monomer from the 15-monomer repeat. Homologous unequal crossing-over is suggested as a probable mechanism by which the different repeat lengths on chromosome 17 were generated, and the putative site of such a recombination event is identified. The monomer organization of the chromosome 17 higher-order repeat unit is based, in part, on tandemly repeated pentamers. A similar pentameric suborganization has been previously demonstrated for alpha satellite of the human X chromosome. Despite the organizational similarities, substantial sequence divergence distinguishes these subsets. Hybridization experiments indicate that the chromosome 17 and X subsets are more similar to each other than to the subsets found on several other human chromosomes. We suggest that the chromosome 17 and X alpha satellite subsets may be related components of a larger alphoid subfamily which have evolved from a common ancestral repeat into the contemporary chromosome-specific subsets.  相似文献   

6.
Human centromeres are mainly composed of alpha satellite DNA hierarchically organized as higher-order repeats (HORs). Alpha satellite dynamics is shown by sequence homogenization in centromeric arrays and by its transfer to other centromeric locations, for example, during the maturation of new centromeres. We identified during prenatal aneuploidy diagnosis by fluorescent in situ hybridization a de novo insertion of alpha satellite DNA from the centromere of chromosome 18 (D18Z1) into cytoband 15q26. Although bound by CENP-B, this locus did not acquire centromeric functionality as demonstrated by the lack of constriction and the absence of CENP-A binding. The insertion was associated with a 2.8-kbp deletion and likely occurred in the paternal germline. The site was enriched in long terminal repeats and located ∼10 Mbp from the location where a centromere was ancestrally seeded and became inactive in the common ancestor of humans and apes 20–25 million years ago. Long-read mapping to the T2T-CHM13 human genome assembly revealed that the insertion derives from a specific region of chromosome 18 centromeric 12-mer HOR array in which the monomer size follows a regular pattern. The rearrangement did not directly disrupt any gene or predicted regulatory element and did not alter the methylation status of the surrounding region, consistent with the absence of phenotypic consequences in the carrier. This case demonstrates a likely rare but new class of structural variation that we name “alpha satellite insertion.” It also expands our knowledge on alphoid DNA dynamics and conveys the possibility that alphoid arrays can relocate near vestigial centromeric sites.  相似文献   

7.
Tandemly arrayed non-coding sequences or satellite DNAs (satDNAs) are rapidly evolving segments of eukaryotic genomes, including the centromere, and may raise a genetic barrier that leads to speciation. However, determinants and mechanisms of satDNA sequence dynamics are only partially understood. Sequence analyses of a library of five satDNAs common to the root-knot nematodes Meloidogyne chitwoodi and M. fallax together with a satDNA, which is specific for M. chitwoodi only revealed low sequence identity (32–64%) among them. However, despite sequence differences, two conserved motifs were recovered. One of them turned out to be highly similar to the CENP-B box of human alpha satDNA, identical in 10–12 out of 17 nucleotides. In addition, organization of nematode satDNAs was comparable to that found in alpha satDNA of human and primates, characterized by monomers concurrently arranged in simple and higher-order repeat (HOR) arrays. In contrast to alpha satDNA, phylogenetic clustering of nematode satDNA monomers extracted either from simple or from HOR array indicated frequent shuffling between these two organizational forms. Comparison of homogeneous simple arrays and complex HORs composed of different satDNAs, enabled, for the first time, the identification of conserved motifs as obligatory components of monomer junctions. This observation highlights the role of short motifs in rearrangements, even among highly divergent sequences. Two mechanisms are proposed to be involved in this process, i.e., putative transposition-related cut-and-paste insertions and/or illegitimate recombination. Possibility for involvement of the nematode CENP-B box-like sequence in the transposition-related mechanism and together with previously established similarity of the human CENP-B protein and pogo-like transposases implicate a novel role of the CENP-B box and related sequence motifs in addition to the known function in centromere protein binding.  相似文献   

8.
《Gene》1996,169(2):157-164
A highly repetitive sequence in the genomic DNA of the bivalve mollusc Donax trunculus (Dt) has been identified upon restriction with EcoRV. During the time-course of DNA digestion, genomic fragments resolved electrophoretically into a ladder-like banding pattern revealing a tandem arrangement of the repeated elements, thus representing satellite DNA sequences. Cloning and sequence analysis unraveled the presence of two groups of monomer units which can be considered distinctive satellite subfamilies. Each subclass is distinguishable by the presence of 17 evenly spread diagnostic nucleotides (nt). The respective consensus sequences are 155 bp in length and differ by 11%, while relevant internal substructures were not observed. The two satellite subfamilies constitute 0.23 and 0.09% of the Dt genome, corresponding to 20 000 and 7600 copies per haploid complement, respectively. Sequence mutations often appear to be shared between two or more monomer variants, indicating a high degree of homogenization as opposed to that of random mutational events. Shared mutations among variants appear either as single changes or in long stretches. This pattern may arise from gene conversion mechanisms acting at different levels, such as the spread of nt sequences of a similar length to the monomer repeat itself, and the diffusion of short tracts a few bp long. Subfamilies might have evolved from the occasional amplification and spreading of a monomer variant effected by gene conversion events  相似文献   

9.
Titin is a giant protein of striated muscle with important roles in the assembly, intracellular signalling and passive mechanical properties of sarcomeres. The molecule consists principally of ∼ 300 immunoglobulin and fibronectin domains arranged in a chain more than 1 μm long. The isoform-dependent N-terminal part of the molecule forms an elastic connection between the end of the thick filament and the Z-line. The larger, constitutively expressed C-terminal part is bound to the thick filament. Through most of the thick filament part, the immunoglobulin and fibronectin domains are arranged in a repeating pattern of 11 domains termed the ‘large super-repeat’. There are 11 contiguous copies of the large super-repeat making up a segment of the molecule nearly 0.5 μm long. We have studied a set of two-domain and three-domain recombinant fragments from the large super-repeat region by electron microscopy, synchrotron X-ray solution scattering and analytical ultracentrifugation, with the goal of reconstructing the overall structure of this part of titin. The data illustrate different average conformations in different domain pairs, which correlate with differences in interdomain linker lengths. They also illustrate interdomain bending and flexibility around average conformations. Overall, the data favour a helical conformation in the super-repeat. They also suggest that this region of titin is dimerised when bound to the thick filament.  相似文献   

10.
Origin and evolution of a major feline satellite DNA   总被引:7,自引:0,他引:7  
A major satellite DNA has been cloned from the domestic cat (Felis catus) and characterized. The satellite monomer, termed FA-SAT, is 483 base-pairs in size, 64% G + C, and represents about 1 to 2% of the cat genome. A consensus sequence based upon partial sequence data from 21 independently isolated clones demonstrates: (1) FA-SAT is not composed of a series of shorter repeats, although about 25 copies, primarily imperfect, of the hexanucleotide TAACCC appear in the sequence; (2) there are many more CpG dinucleotides present in FA-SAT than expected for a random sequence of its size; and (3) 61% of all base substitutions in FA-SAT involve the replacement of G and C residues by A and T residues, indicating that FA-SAT is rapidly becoming A + T-rich. FA-SAT-related sequences are found in many mammals, where they appear to be scattered throughout the genome and not tandemly arranged as in the cat. An FA-SAT-related sequence was cloned from the domestic dog genome and sequenced, and shown to contain multiple copies of the same TAACCC hexanucleotide found in the cat satellite.  相似文献   

11.
To understand evolutionary events in the formation of higher-order repeat units in alpha satellite DNA, we have examined gorilla sequences homologous to human X chromosome alpha satellite. In humans, alpha satellite on the X chromosome is organized as a tandemly repeated, 2.0 x 10(3) base-pairs (bp) higher-order repeat unit, operationally defined by the restriction enzyme BamHI. Each higher-order repeat unit is composed of 12 tandem approximately 171 base-pair monomer units that have been classified into five distinct sequence homology groups. BamHI-digested gorilla genomic DNA hybridized with the cloned human 2 x 10(3) bp X alpha satellite repeat reveals three bands of sizes approximately 3.2 x 10(3), 2.7 x 10(3) and 2 x 10(3) bp. Multiple copies of all three repeat lengths have been isolated and mapped to the centromeric region of the gorilla X chromosome by fluorescence in situ hybridization. Long-range restriction mapping using pulsed-field gel electrophoresis shows that the 2.7 x 10(3) and 3.2 x 10(3) bp repeat arrays exist as separate but likely neighboring arrays on the gorilla X, each ranging in size from approximately 200 x 10(3) to 500 x 10(3) bp, considerably smaller than the approximately 2000 x 10(3) to 4000 x 10(3) bp array found on human X chromosomes. Nucleotide sequence analysis has revealed that monomers within all three gorilla repeat units can be classified into the same five sequence homology groups as monomers located within the higher-order repeat unit on the human X chromosome, suggesting that the formation of the five distinct monomer types predates the divergence of the lineages of contemporary humans and gorillas. The order of 12 monomers within the 2 x 10(3) and 2.7 x 10(3) bp repeat units from the gorilla X chromosome is identical with that of the 2 x 10(3) bp repeat unit from the human X chromosome, suggesting an ancestral linear arrangement and supporting hypotheses about events largely restricted to single chromosome types in the formation of alpha satellite higher-order repeat units.  相似文献   

12.
闫守庆  祝万菊  张雪梅  李冰  孙金海 《遗传》2007,29(12):1504-1508
利用限制性内切酶酶切蓝狐基因组, 经琼脂糖凝胶电泳, 对特异性亮带进行克隆、测序及序列分析。结果获得42个卫星DNA序列, 该卫星DNA单体大小为737 bp, G+C含量为51.9%, 单体之间同源性为91%~97%; 每个单体由3个约245 bp的亚重复串联构成, 亚重复之间的同源性为49%~55%; 在物种进化过程中, 该卫星DNA有G+C含量逐渐降低而A+T含量逐渐上升的趋势; 该卫星DNA为犬科动物种属所特有, 与犬着丝粒相关卫星DNA为同类卫星DNA, 同源性为74%, 命名为α-卫星DNA。  相似文献   

13.
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.  相似文献   

14.
The centromeric regions of human chromosomes contain long tracts of tandemly repeated DNA, of which the most extensively characterized is alpha satellite. In a screen for additional centromeric DNA sequences, four phage clones were obtained which contain alpha satellite as well as other sequences not usually found associated with tandemly repeated alpha satellite DNA, including L1 repetitive elements, an Alu element, and a novel AT-rich repeated sequence. The alpha satellite DNA contained within these clones does not demonstrate the higher-order repeat structure typical of tandemly repeated alpha satellite. Two of the clones contain inversions; instead of the usual head-to-tail arrangement of alpha satellite monomers, the direction of the monomers changes partway through each clone. The presence of both inversions was confirmed in human genomic DNA by polymerase chain reaction amplification of the inverted regions. One phage clone contains a junction between alpha satellite DNA and a novel low-copy repeated sequence. The junction between the two types of DNA is abrupt and the junction sequence is characterized by the presence of runs of A's and T's, yielding an overall base composition of 65% AT with local areas > 80% AT. The AT-rich sequence is found in multiple copies on chromosome 7 and homologous sequences are found in (peri)centromeric locations on other human chromosomes, including chromosomes 1, 2, and 16. As such, the AT-rich sequence adjacent to alpha satellite DNA provides a tool for the further study of the DNA from this region of the chromosome. The phage clones examined are located within the same 3.3-Mb SstII restriction fragment on chromosome 7 as the two previously described alpha satellite arrays, D7Z1 and D7Z2. These new clones demonstrate that centromeric repetitive DNA, at least on chromosome 7, may be more heterogeneous in composition and organization than had previously been thought.  相似文献   

15.
A highly abundant satellite DNA comprising 20% of the Meloidogyne fallax (Nematoda, Tylenchida) genome was cloned and sequenced. The satellite monomer is 173 bp long and has a high A + T content of 72.3%, with frequent runs of A's and T's. The sequence variability of the monomers is 2.7%, mainly due to random distribution of single-point mutations. A search for evidence of internal repeated subunits in the monomer sequence revealed a 6-bp motif (AAATTT) for which five degenerated repeats, differing by just a single base pair, could be identified. Pairwise comparison of the M. fallax satellite with those from the sympatric species Meloidogyne chitwoodi and Meloidogyne hapla revealed a high sequence similarity (68.39%) with one satellite DNA subfamily in M. chitwoodi, which indicated an unexpected close relationship between them. Given the high copy number and the extreme sequence homogeneity among monomeric units, it may be assumed that the satellite DNA of M. fallax could have evolved through some recent and extensive amplification burst in the nematode genome. In this case, its relatively short life would not yet have allowed the accumulation of random mutations in independent amplified repeats. Considering the morphological resemblance between the two species and their ability to produce interspecific fertile hybrids under controlled conditions, these results indicate that M. fallax may share a common ancestor with M. chitwoodi, from which it could have diverged recently. All these data suggest that M. fallax could be the result of a recent speciation process and show that Meloidogyne satellite DNAs may be of interest to resolve phylogenetic relationships among closely related species from this genus.   相似文献   

16.
M Ekker  A Fritz  M Westerfield 《Genomics》1992,13(4):1169-1173
To further our understanding of the structure and organization of the zebrafish genome, we have undertaken the analysis of highly and middle-repetitive DNA sequences. We have cloned and sequenced two families of tandemly repeated DNA fragments. The monomer units of the Type I satellite-like sequence are 186 bp long, A+T-rich (65%), and exhibit a high degree of sequence conservation. The Type I satellite-like sequence constitutes 8% of the zebrafish genome, or approximately 8 x 10(5) copies per haploid genome. Southern analysis of genomic DNA, digested with several restriction endonucleases, shows a ladder of hybridizing bands, consistent with a tandem array, and suggests longer range periodic variations in the sequence of the tandem repeats. The Type II satellite has a monomer length of 165 bp, is also A+T-rich (68%), and constitutes 0.2% of the zebrafish genome (22,000 copies per haploid genome). Southern analysis reveals a complex pattern rather than a ladder of regularly spaced hybridizing bands.  相似文献   

17.
Alpha satellite DNA, a diverse family of tandemly repeated DNA sequences located at the centromeric region of each human chromosome, is organized in a highly chromosome-specific manner and is characterized by a high frequency of restriction-fragment-length polymorphism. To examine events underlying the formation and spread of these polymorphisms within a tandem array, we have cloned and sequenced a representative copy of a polymorphic array from the X chromosome and compared this polymorphic copy with the predominant higher-order repeat form of X-linked alpha satellite. Sequence data indicate that the polymorphism arose by a single base mutation that created a new restriction site (for HindIII) in the sequence of the predominant repeat unit. This variant repeat unit, marked by the new HindIII site, was subsequently amplified in copy number to create a polymorphic domain consisting of approximately 500 copies of the variant repeat unit within the X-linked array of alpha satellite. We propose that a series of intrachromosomal recombination events between misaligned tandem arrays, involving multiple rounds of either unequal crossing-over or sequence conversion, facilitated the spread and fixation of this variant HindIII repeat unit.  相似文献   

18.
A novel highly abundant satellite DNA comprising 20% of the genome has been characterized in Palorus subdepressus (Insecta, Coleoptera). The 72-bp-long monomer sequence is composed of two copies of T2A5T octanucleotide alternating with 22-nucleotide-long elements of an inverted repeat. Phylogenetic analysis revealed clustering of monomer sequence variants into two clades. Two types of variants are prevalently organized in an alternating pattern, thus showing a tendency to generate a new complex repeating unit 144 bp in length. Fluorescent in situ hybridization revealed even distribution of the satellite in the region of pericentric heterochromatin of all 20 chromosomes. P. subdepressus satellite sequence is clearly species specific, lacking similarity even with the satellite from congeneric species P. ratzeburgii. However, on the basis of similarity in predicted tertiary structure induced by intrinsic DNA curvature and in repeat length, P. subdepressus satellite can be classified into the same group with satellites from related tenebrionid species P. ratzeburgii, Tenebrio molitor, and T. obscurus. It can be reasonably inferred that repetitive sequences of different origin evolve under constraints to adopt and conserve particular features. Obtained results suggest that the higher-order structure and repeat length, but not the nucleotide sequence itself, are maintained through evolution of these species. Received: 23 April 1997 / Accepted: 11 July 1997  相似文献   

19.
To further our understanding of the structure and organization of the zebrafish genome, we have undertaken the analysis of highly and middle-repetitive DNA sequences. We have cloned and sequenced two families of tandemly repeated DNA fragments. The monomer units of the Type I satellite-like sequence are 186 bp long, A+T-rich (65%), and exhibit a high degree of sequence conservation. The Type I satellite-like sequence constitutes 8% of the zebrafish genome, or approximately 8 × 105 copies per haploid genome. Southern analysis of genomic DNA, digested with several restriction endonucleases, shows a ladder of hybridizing bands, consistent with a tandem array, and suggests longer range periodic variations in the sequence of the tandem repeats. The Type II satellite has a monomer length of 165 bp, is also A+T-rich (68%), and constitues 0.2% of the zebrafish genome (22,000 copies per haploid genome). Southern analysis reveals a complex pattern rather than a ladder of regularly spaced hybridizing bands.  相似文献   

20.
The genomic organization of two satellite DNA sequences, pHvMWG2314 and pHvMWG2315, of barley (Hordeum vulgare, 2n=14, HH) was studied by comparative in situ hybridization (ISH) and PCR analysis. Both sequences are members of different RsaI families. The sequence pHvMWG2314 is a new satellite element with a monomer unit of 73 bp which is moderately amplified in different grasses and occurs in interstitial clusters on D-genome chromosomes of hexaploid wheat (Triticum aestivum, 2n=42, AABBDD). The 331-bp monomer pHvMWG2315 belongs to a tandemly amplified repetitive sequence family that is present in the Poaceae and preferentially amplified in Aegilops squarrosa (2n=14, DD), H. vulgare and Agropyron elongatum. (2n=14, EE). The first described representative of this family was pAs 1 from Ae. squarrosa. Different sequences of one satellite DNA family were amplified from Ae. squarrosa, A. elongatum and H. vulgare using PCR. Characteristic differences between members of the D and H genome occurred in a variable region which is flanked by two conserved segments. The heterogeneity within this element was exploited for the cytogenetic analysis of Triticeae genomes and chromosomes. Comparative ISH with pHvMWG2315 identified individual wheat and barley chromosomes under low (75%) and high (85%) hybridization stringency in homologous and heterologous systems. We propose the designation Tas330 for the Triticeae amplified sequence (Tas) satellite family with a 330 bp average monomer length.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号