首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudo-periodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution. RESULTS: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith-Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%.  相似文献   

2.
We compared the nucleotide sequences of 3 yeast invertase genes in regions where the homology is better than 90%. In the noncoding region 40 gaps of 1-61 bases were found. This is about half as much as the nucleotide substitutions in the same sequences. We grouped the gaps into 5 categories by their length and the characteristics of their sequences. Group I gaps are about 20 nucleotides long and are flanked by repeated sequence of 6 bases which may trigger the deletion of one of the repeats and the sequence between the repeats. Group II gaps are characterized by a small repeated sequence which is missing in one of the invertase genes. Gaps which occur in sequences exclusively made up of one of the 4 bases are summarized in group III. The 4 gaps in group IV do not show any of these sequence characteristics and they are all just one base long. A 61 nucleotide sequence found in only one of the invertase genes seems to be of complex origin. We conclude that small repeated sequences or monotonous sequences are prone to deletion or insertion mutations.  相似文献   

3.
We present an analysis of a chromosomal walk in the region of the euchromatin-heterochromatin transition at the base of the X chromosome of Drosophila melanogaster. This region is difficult to analyse because of the presence of repeated sequences, and we have used cosmids to walk from the last euchromatic gene, suppressor of forked, towards the pericentric heterochromatin. The proximal 30-kb sequence we have isolated consists of repetitive DNA, including four tandem copies of a 5.9-kb sequence. This tandem repeat is itself a mosaic of other, mostly repeated, sequences, including part of a retrotransposon without long terminal repeats, a simple-sequence region of TAA repeats and part of a retrotransposon with long terminal repeats that has not been previously described. Although sequences homologous to these components are found elsewhere in the genome, this arrangement of repeated sequences is only found at the base of the X chromosome. It is conserved in D. melanogaster strains of different geographic origin, but is not conserved in even closely related species.  相似文献   

4.
The long (4.6-kb) A+T region of Drosophila melanogaster mitochondrial DNA has been cloned and sequenced. The A+T region is organized in two large arrays of tandemly repeated DNA sequence elements, with nonrepetitive intervening and flanking sequences comprising only 22% of its length. The first repeat array consists of five repeats of 338-373 bp. The second consists of four intact 464-bp repeats and a fifth partial repeat of 137 bp. Three DNA sequence elements are found to be highly conserved in D. melanogaster and in several Drosophila species with short A+T regions. These include a 300-bp DNA sequence element that overlaps the DNA replication origin and two thymidylate stretches identified on opposite DNA strands. We conclude that the length heterogeneity observed in the A+T regulatory region in mitochondrial DNAs from the genus Drosophila results from the expansion (and contraction) of the number of repeated DNA sequence elements. We also propose that the 300-bp conserved DNA sequence element, in conjunction with another primary sequence determinant, perhaps the adjacent thymidylate stretch, functions in the regulation of mitochondrial DNA replication.   相似文献   

5.
6.
7.
The heavy chain isotype switch is mediated by a DNA rearrangement between a donor switch region (usually mu) and a recipient switch region (gamma, epsilon, or alpha). Switch regions lie upstream of the appropriate heavy chain constant region gene and are composed of simple sequences repeated in tandem. It is not known to what extent the tandemly repeated sequences are important to the heavy chain switch recombination, and to what extent other features of switch region sequences might contribute to the switch process. We studied switches to the gamma 3 isotype by sequencing the entire gamma 3 switch region. This switch region is composed of forty-four 49 base pair units repeated in tandem. These repeated units share modest homology with the mu switch region repeated elements. Evolution of the gamma 3 switch region seems to involve insertions and deletions of the 49mer elements. We also molecularly cloned rearranged switch regions from two gamma 3-expressing hybridomas and determined the DNA sequences at the mu-gamma 3 recombination sites. We located these switch recombination sites within the germ-line gamma 3 switch region, as well as switch recombination sites from two myelomas. All four sites are found in the 5' one-third of the gamma 3 switch region. We discuss some additional trends in the sequence data near these four recombination sites.  相似文献   

8.
We have analysed the sequence organization of the DNA in the pericentric region of the long arm of the human Y chromosome. The structures of one cosmid and three yeast artificial chromosome clones were determined. The region consists of a mosaic of the known 5, 48 and 68 base-pair tandemly repeated sequences and at least five novel repeated sequence families. A long range-map of approximately 3.5 x 10(6) base-pairs of genomic DNA was constructed that placed the clones between about 500 x 10(3) and 850 x 10(3) base-pairs from the long arm edge of the centromeric alphoid DNA array.  相似文献   

9.
10.
We present the first comprehensive analysis of the crocodilian control region. We have analyzed sequences from all three families of Crocodylia (Crocodylidae, Gavialidae, Alligatoridae), incorporating all genera except Paleosuchus and Melanosuchus. Within the control region of other vertebrates, several sequence motifs and their order appear to be conserved. Herein, we compare aligned crocodilian D-loop sequences to homologous sequences from other vertebrates ranging from fish to birds. Among other findings, we have discovered that while domain I tends to be shorter than the same region in mammals and birds, it contains sequences similar in structure to both the goose-hairpin and termination associated sequences (TAS). Domain II is highly conservative with regard to size among the taxa examined and contains several of the conserved sequence boxes characterized in other vertebrates. Domain III contains several interesting sequence motifs including tandemly repeated sequences, a long poly-A region in the Crocodylidae, and possible bidirection promoter sequences.  相似文献   

11.
12.
13.
The transcriptional control regions of the copia retrotransposon   总被引:4,自引:3,他引:1  
  相似文献   

14.
A cDNA encoding a putative RNA and/or DNA helicase has been isolated from Arabidopsis thaliana cDNA libraries. The cloned cDNA is 5166 bases long, and its largest open reading frame encodes 1538 amino acids. The central region of the predicted protein is homologous to a group of nucleic acid helicases from the DEAD/H family. However, the N- and C-terminal regions of the Arabidopsis cDNA product are distinct from these animal DEIH proteins. We have found that the C-terminal region contains three characteristic sequences: (i) two DNA-binding segments that form a probe helix (PH) involved in DNA recognition; (ii) an SV40-type nuclear localization signal; and (iii) 11 novel tandem-repeat sequences each consisting of about 28 amino acids. We have designated this cDNA as NIH (nuclear DEIH-boxhelicase). Functional character-ization of a recombinant fusion product containing the repeated region indicates that NIH may form homodimers, and that this is the active form in solution. Based on this information and the observation that the sequence homology is limited to the DEAH regions, we conclude that the biological roles of the plant helicase NIH differ from those of the animal DEIH family.  相似文献   

15.
16.
A complete single unit of a ribosomal RNA gene (rDNA) of M. croslandi was sequenced. The ends of the 18S, 5.8S and 28S rRNA genes were determined by using the sequences of D. melanogaster rDNAs as references. Each of the tandemly repeated rDNA units consists of coding and non-coding regions whose arrangement is the same as that of D. melanogaster rDNA. The intergenic spacer (IGS) contains, as in other species, a region with subrepeats, of which the sequences are different from those previously reported in other insect species. The length of IGSs was estimated to be 7-12 kb by genomic Southern hybridization, showing that an rDNA repeating unit of M. croslandi is 14-19 kb-long. The sequences of the coding regions are highly conserved, whereas IGS and ITS (internal transcribed spacer) sequences are not. We obtained clones with insertions of various sizes of R2 elements, the target sequence of which was found in the 28S rRNA coding region. A short segment in the IGS that follows the 3' end of the 28S rRNA gene was predicted to form a secondary structure with long stems.  相似文献   

17.
The beta antigen of the lbc protein complex of Group B streptococci is a cell-surface receptor which binds the Fc region of human immunoglobulin A (IgA). Determination of the nucleotide sequence of the beta antigen gene shows that it encodes a preprotein having a molecular weight of 130,963 daltons and a polypeptide of 1164 amino acid residues that is typical of other Gram-positive cell-wall proteins. There is a long signal sequence of 37 amino acids at the N-terminus. Four of the five C-terminal amino acid residues are basic and are preceded by a hydrophobic stretch that appears to anchor the C-terminus in the cell membrane. To the N-terminal side of this hydrophobic stretch is a putative cell-wall-spanning region containing proline-rich repeated sequences. An unusual feature of these repeated sequences is a three-residue periodicity, whereby every first residue is a proline, the second residue is alternating positively or negatively charged, and the third residue is uncharged. The IgA-binding activity was approximately localized by expressing subfragments of the beta antigen as fusion proteins. Two distinct but adjacent DNA segments specified peptides that bound IgA, which indicates that the IgA-binding activity is located in two distinct regions of the protein.  相似文献   

18.
Hybrids formed by insertion of the plasmid maintenance regions of P1 or F into a lambda delta att vector form stable unit-copy plasmids in their Escherichia coli host. They must therefore both be substrates for an accurate cellular partition apparatus that ensures that all daughter cells inherit a plasmid copy. Analysis of deletion mutants of both types of hybrid showed that, although the P1 and F plasmid maintenance regions differ in sequence and specificity, they are similar in general organization. Each contains an approximately 3 X 10(3) base-pair region that is essential for replication (rep) and an adjacent but separable 3 X 10(3) base-pair region that is essential for the stability of plasmid maintenance (par). Each par region is thought to specify the recognition of the plasmid as a substrate for equipartition. The deletion mutants provide sources of isolated rep and par sequences from both P1 and F DNA. These elements were then used to construct composite plasmids with novel combinations and arrangements of rep and par sequences. Heterologous constructions containing P1 rep and F par or F rep and P1 par sequences were maintained faithfully. We conclude that par regions are both necessary and sufficient to promote equipartition of replicating plasmid DNA. This activity is exerted only in cis but otherwise seems to be independent of the position or orientation of the par sequences within the DNA. Both P1 and F par regions include DNA sequences (incB of P1, incD of F) that we propose are analogues of the centromeres of eukaryotic chromosomes. The remaining portions of the par regions are known to encode protein products that, we believe, act at the inc sites. Extra copies of these inc sites appear to exert incompatibility by competition for the cellular partition apparatus.  相似文献   

19.
The centromeric regions of human chromosomes contain long tracts of tandemly repeated DNA, of which the most extensively characterized is alpha satellite. In a screen for additional centromeric DNA sequences, four phage clones were obtained which contain alpha satellite as well as other sequences not usually found associated with tandemly repeated alpha satellite DNA, including L1 repetitive elements, an Alu element, and a novel AT-rich repeated sequence. The alpha satellite DNA contained within these clones does not demonstrate the higher-order repeat structure typical of tandemly repeated alpha satellite. Two of the clones contain inversions; instead of the usual head-to-tail arrangement of alpha satellite monomers, the direction of the monomers changes partway through each clone. The presence of both inversions was confirmed in human genomic DNA by polymerase chain reaction amplification of the inverted regions. One phage clone contains a junction between alpha satellite DNA and a novel low-copy repeated sequence. The junction between the two types of DNA is abrupt and the junction sequence is characterized by the presence of runs of A's and T's, yielding an overall base composition of 65% AT with local areas > 80% AT. The AT-rich sequence is found in multiple copies on chromosome 7 and homologous sequences are found in (peri)centromeric locations on other human chromosomes, including chromosomes 1, 2, and 16. As such, the AT-rich sequence adjacent to alpha satellite DNA provides a tool for the further study of the DNA from this region of the chromosome. The phage clones examined are located within the same 3.3-Mb SstII restriction fragment on chromosome 7 as the two previously described alpha satellite arrays, D7Z1 and D7Z2. These new clones demonstrate that centromeric repetitive DNA, at least on chromosome 7, may be more heterogeneous in composition and organization than had previously been thought.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号