首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We calculated correlations of the nucleotide distributions along the E. coli genome. Subsequent cluster analysis of the correlation distributions showed that the genome was composed of two qualitatively different types of nucleotide sequences. The first type exhibited strong correlations of the genomic distributions of A with T and G with C, and high anticorrelations of A with C and G with T. In contrast, the second type was characterized by weak or negligible correlations typical of randomized sequences. Both types of sequences were almost equally abundant in the E. coli genome and their length varied from several hundred nucleotides to about 70 kilobases. They were not disjunct with respect to their (G + C) content but the high correlations and anticorrelations were rather characteristic for (A + T)-rich genomic segments. We offer possible explanations of the mosaic structure of the E. coli genome.  相似文献   

2.
We have analyzed correlations of nucleotide distributions along more than 50 megabases of the longest sequenced parts of the human, mouse, Drosophila, Arabidopsis, yeast, E.coli and three kinds of viral genomes. The strongest correlations were observed between the distributions of C and G, in particular in the genome of Drosophila. This correlation was much weaker, though still strong, in the human genome and E.coli that exhibited the same level of this correlation. The C/G correlation hardly originates from the isochores because the isochores were not reported to occur in the genomes of Drosophila and E. coil. The genomic distribution curves of adenine and thymine were also positively correlated in all analyzed organisms except for the yeast where they were anticorrelated. Still stronger anticorrelations were, however, observed between the genomic distributions of A and C and between G and T. These genomic distributions anticorrelated almost generally and very strong. These anticorrelations are likely to originate from point mutations resulting from unrepaired GA mispairing as a replication intermediate. The C/A or G/T anticorrelation or compensation is a very strong and general new phenomenon that shapes the genomic nucleotide sequences.  相似文献   

3.
We used synthetic oligonucleotide DNA probes specific for the four-base repetitive core sequences (GACA)n and (AGGC)n to examine human genomic variation. The results of hybridizing these oligonucleotides to human genomic digests indicate that they are useful and accessible markers for ubiquitously repeated regions of DNA in the human genome. Furthermore, these sequences appear to be highly conserved in eukaryotic genomes, but their function remains largely unknown.  相似文献   

4.
5.
A single 880-base-pair region within the genome of simian cytomegalovirus strain Colburn contains sequences that hybridize intensely with both human and mouse total genome DNA probes. This sequence was also found in a second simian cytomegalovirus isolate and was retained in both plaque-purified virus subclones and in plasmid DNA clones containing the SalI P fragment. Cleaved genomic DNAs from several mammalian species all exhibited strong dispersed hybridization with the SalI-P probes, and over 70% of the lambda clones in a mouse genomic library plus several selected clones containing globin, 45S rDNA, or 5S rDNA genes all formed hybrids with SalI-P. The appropriate region of cytomegalovirus SalI-P contains relatively A + T-rich unique sequences interrupted by three stretches of the simple alternating dinucleotides, (CA)15, (CA)22, and (CA)21, which we show to be responsible for most of the cell-virus homology. We conclude that discrete, tandemly repeated (CA) dinucleotide tracts capable of forming left-handed Z-DNA helices punctuate mammalian genomes at greater than 10(5) copies per cell and that three adjacent copies of what appear to be a family of interspersed repetitive elements containing these (CA)n stretches are carried in the genomes of simian cytomegaloviruses.  相似文献   

6.
Ninety-nine members of the salmonid HpaI and AvaIII families of short interspersed repetitive elements (SINEs) were aligned and a general consensus sequence was deduced. The presence of 26 correlated changes in nucleotides (diagnostic nucleotides) from those in the consensus sequence allowed us to divide the members of the HpaI family into 12 subfamilies and those of the AvaIII family into two subfamilies. On the basis of the average sequence divergences and the phylogenetic distributions of the subfamilies, the relative antiquity of the subfamilies and the process of sequential changes in the respective source sequences were inferred. Despite the higher mutation rates of CG dinucleotides in individual dispersed members, no hypermutability of CG positions was observed in changes in the source sequences. This result suggests that sequences of SINEs located in a nonmethylated or hypomethylated genomic region could have been selected as source sequences for retroposition and/or that some CG sites are the parts of recognition sequences of retropositional machineries. Correspondence to: N. Okada  相似文献   

7.
The formation of triplex DNA is a site-specific recognition method that directly targets duplex DNA. However, triplex DNA formation is generally formed for the GC and AT base pairs of duplex DNA, and there are no natural nucleotides that recognize the CG and TA base pairs, or even the 5-methyl-CG (5mCG) base pair. Moreover, duplex DNA, including 5mCG base pairs, epigenetically regulates gene expression in vivo, and thus targeting strategies are of biological importance. Therefore, the development of triplex-forming oligonucleotides (TFOs) with artificial nucleosides that selectively recognize these base pairs with high affinity is needed. We recently reported that 2′-deoxy-2-aminonebularine derivatives exhibited the ability to recognize 5mCG and CG base pairs in triplex formation; however, this ability was dependent on sequences. Therefore, we designed and synthesized new nucleoside derivatives based on the 2′-deoxy-nebularine (dN) skeleton to shorten the linker length connecting to the hydrogen-bonding unit in formation of the antiparallel motif triplex. We successfully demonstrated that TFOs with 2-guanidinoethyl-2′-deoxynebularine (guanidino-dN) recognized 5mCG and CG base pairs with very high affinity in all four DNA sequences with different adjacent nucleobases of guanidino-dN as well as in the promoter sequences of human genes containing 5mCG base pairs with a high DNA methylation frequency.  相似文献   

8.
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.  相似文献   

9.
10.
The affinity chromatography on uracil-coupled cellulose was carried out for the separation of nucleosides, nucleotides and oligonucleotides. Adenine derivatives exhibited a high affinity to uracil-cellulose, and sequencial isomers of oligonucleotides containing adenine residue were resolved. Poly(A) was strongly bound to uracil-cellulose and recovered by the elution with 7M urea. This procedure was extended to the isolation of mRNA containing poly(A) sequences.  相似文献   

11.
The ability of certain azole substituted oligodeoxy-ribonucleotides to promote antiparallel triple helix formation with duplex targets having CG or TA interruptions in the otherwise homopurine sequence was examined. 2'-Deoxyribonucleosides of the azoles, which include pyrazole, imidazole, 1,2,4-triazole and 1,2,3,4-tetrazole were synthesized using the stereo-specific sodium salt glycosylation procedure. These nucleosides were successfully incorporated using solid-support, phosphoramidite chemistry, into oligonucleotides designed to interact with the non-homopurine duplex targets. The interaction of these modified oligonucleotides with all four possible base pairs was evaluated and compared to similar data for a series of natural oligonucleotides. The oligonucleotides containing simple azoles enhanced the triplex forming ability considerably at non-homopurine targets. Binding of these modified oligonucleotides to duplex targets containing TA inversion sites was particularly noteworthy, and compare favorably to unmodified oligonucleotides for binding to duplex targets containing CG as well as TA base pairs. The selectivity exhibited by certain azoles is suggestive of base pair specific interactions. Thus, the azoles evaluated during this study show considerable promise for efforts to develop generalized triplex formation at non-homopurine duplex sequences.  相似文献   

12.
基因组序列k-mer的非随机使用规律及包含的生物学意义一直是人们关注的问题,目前还没有根本性进展。本文以七个物种的全部基因序列为样本,得到各物种基因组序列的8-mer频谱分布。发现狗和牛的频谱有三个峰,而斑马鱼、青鳉鱼、秀丽线虫和酿酒酵母的频谱只有一个峰,鸡的频谱分布形状介于两者之间。将8-mer集合按照XY二核苷含量分类,结果显示只有CG二核苷分类下0CG、1CG和2CG三类子集的频谱形成各自独立的单峰分布。对照随机序列,发现0CG模体是随机进化的,1CG和2CG模体是定向进化的,它们的使用频次远小于随机频次,且这种独立进化分离规律具有物种普适性。三个CG子集频谱之间的距离是产生单峰或多峰现象的根本原因。将七个物种基因组序列标准化到109bp,比较发现1CG和2CG子集频谱与物种进化显著相关,0CG子集频谱与物种进化无显著关系。可以认为三种CG模体各自执行着不同的生物学功能。基因组序列8-mer的独立分离规律为揭示基因组结构、基因组进化以及模体的生物功能提供了一种新的思维方式。  相似文献   

13.
Recombinant DNA plasmids containing sequences coding for the alpha subunit of the bovine pituitary glycoprotein hormones have been isolated. The nucleotide sequences of three different cDNA clones have been determined. The largest alpha-subunit cDNA clone was found to contain 713 bases including 77 nucleotides from the 5'-untranslated region, 72 nucleotides coding for a precursor segment, 288 nucleotides coding for the mature alpha subunit, and 276 nucleotides from the 3'-untranslated region of the mRNA followed by a poly(A) segment. This cDNA likely represents most of the bovine alpha-subunit mRNA sequence. Nucleotide sequences were obtained from the cDNA inserts of two other alpha-subunit clones, and several differences among the three cDNA sequences have been detected. These differences in nucleotide sequence may represent either individual variation in genomic sequence or cloning artifacts. Comparison of the bovine alpha-subunit cDNA sequence to the sequences of human, rat, and mouse alpha-subunit cDNAs reveals that the bovine sequence has greater than 70% homology with the other cDNAs. The cloned alpha-subunit cDNA should provide a useful probe for further studies of the structure and expression of this interesting gene.  相似文献   

14.
Characterization of the segmental duplication LCR7-20 in the human genome   总被引:1,自引:0,他引:1  
Liu X  Li X  Li M  Acimovic YJ  Li Z  Scherer SW  Estivill X  Tsui LC 《Genomics》2004,83(2):262-269
Our previous study described the amplification of a genomic sequence containing exon 9 of CFTR in the human genome. Here we report that this CFTR sequence is part of a large duplicated sequence unit, provisionally named LCR7-20. Through successive screening of two human chromosome 7-specific cosmid libraries to construct a cosmid contig, we assembled two sequenced BAC clones into a single contig containing a prototypic LCR7-20 unit. Subsequent searches of existing human genome sequences identified additional six copies of LCR7-20-like sequences with more than 90% sequence homology. Additional genomic clones containing LCR7-20-like sequences were then isolated from total genomic BAC and PAC libraries. Restriction fragment analysis and limited sequencing data indicated that there could be around 30 copies of LCR7-20-like sequences in the human genome and that the average region of homology could extend over 120 kb. As indicated by fluorescence in situ hybridization analysis, LCR7-20-like sequences are dispersed on different chromosomes, mainly in the centromeric and pericentromeric regions, and some may exist in tandem copies. Our study also indicates that many genomic regions containing LCR7-20's either have been misassembled or are missing in current versions of the human genome sequence.  相似文献   

15.
16.
The deviation from randomness in the distribution of nucleotides in genomic sequences is quantified and studied, using a modified standard deviation (MSD). This method implies a "per block" computation of the standard deviation of the nucleotide frequencies of occurrence, using local means (means taken in a neighborhood of each block). This quantity may serve as a scale-dependent measure of the nucleotide clustering. In the present work, the meso-scale of tenths of nucleotides is principally explored, by means of suitably adjusted filter parameters. This length scale is of an order of magnitude not directly affected by the grammar and syntax rules of the protein-coding procedure, remaining shorter than the scale of appearance of large-scale characteristics of the genome. MSD has been found to distinguish systematically between the sequences of different origin and functionality. The most near-random are found to be coding sequences of prokaryotes, while in intronic and intergenic regions of eukaryotic genomes, extended clustering of similar nucleotides is observed. The distributions of MSD values of large collections of sequences are found to be in most cases characteristic of their biological role and origin. Protein- and non-coding, prokaryotic and eukaryotic DNA as well as promoter, rRNA, viral and organelle sequences have been examined. The presented results corroborate a recently proposed model for genome evolution. The method is also applied for an assessment of the annotation of ORFs taken from the complete genome of Saccharomyces cerevisiae.  相似文献   

17.
18.
Two human genomic clones containing the lactate dehydrogenase-B processed pseudogene were isolated from two patients deficient in lactate dehydrogenase-B isozyme. The sequences of 3,287 nucleotides, including the pseudogenes and its flanking regions, from both clones were found to be identical except for three differences in the pseudogenes. The sequences of 1,286 nucleotides from these two pseudogenes exhibited 93% homology with the cDNA sequence of the lactate dehydrogenase-B functional gene, and the pseudogene contained 75/76 base substitutions, 11/12 single-base deletions, and 5 single-base insertions. This pseudogene was mapped to the x-chromosome by dot-blot analysis using a probe for the pseudogene or its 5' flanking sequence.  相似文献   

19.
We have recently shown that a newly isolated avian sarcoma virus, UR2, is defective in replication and contains no sequences homologous to the src gene of Rous sarcoma virus. In this study, we analyzed the genetic structure and transforming sequence of UR2 by oligonucleotide fingerprinting. The sizes of the genomic RNAs of UR2 and its associated helper virus, UR2AV, were determined to be 24S and 35S, respectively, by sucrose gradient sedimentation. The molecular weight of the 24S UR2 genomic RNA was estimated to be 1.1 x 10(6), corresponding to 3,300 nucleotides, by gel electrophoresis under the native and denatured conditions. RNase T1 oligonucleotide mapping indicated that UR2 RNA contains seven unique oligonucleotides in the middle of the genome and shares eight 5'- and six 3'-terminal oligonucleotides with UR2AV RNA. From these data, we estimated that UR2 RNA contains a unique sequence of about 12 kilobases in the middle of the genome, and contains 1.4 and 0.7 kilobases of sequences shared with UR2AV RNA at the 5' and 3' ends, respectively. Partial sequence analysis of the UR2-specific oligonucleotides by RNase A digestion revealed that there are no homologous counterparts to these oligonucleotides in the RNAs of other avian sarcoma and acute leukemia viruses studied to date. UR2-transformed non-virus-producing cells contain a single 24S viral RNA which is most likely the message coding for the transforming protein of UR2. On the basis of the uniqueness of the transforming sequence, we concluded that UR2 is a new member of the defective avian sarcoma viruses.  相似文献   

20.

Background

The periodical occurrence of dinucleotides with a period of 10.4 bases now is undeniably a hallmark of nucleosome positioning. Whereas many eukaryotic genomes contain visible and even strong signals for periodic distribution of dinucleotides, the human genome is rather featureless in this respect. The exact sequence features in the human genome that govern the nucleosome positioning remain largely unknown.

Results

When analyzing the human genome sequence with the positional autocorrelation method, we found that only the dinucleotide CG shows the 10.4 base periodicity, which is indicative of the presence of nucleosomes. There is a high occurrence of CG dinucleotides that are either 31 (10.4 × 3) or 62 (10.4 × 6) base pairs apart from one another - a sequence bias known to be characteristic of Alu-sequences. In a similar analysis with repetitive sequences removed, peaks of repeating CG motifs can be seen at positions 10, 21 and 31, the nearest integers of multiples of 10.4.

Conclusions

Although the CG dinucleotides are dominant, other elements of the standard nucleosome positioning pattern are present in the human genome as well. The positional autocorrelation analysis of the human genome demonstrates that the CG dinucleotide is, indeed, one visible element of the human nucleosome positioning pattern, which appears both in Alu sequences and in sequences without repeats. The dominant role that CG dinucleotides play in organizing human chromatin is to indicate the involvement of human nucleosomes in tuning the regulation of gene expression and chromatin structure, which is very likely due to cytosine-methylation/-demethylation in CG dinucleotides contained in the human nucleosomes. This is further confirmed by the positions of CG-periodical nucleosomes on Alu sequences. Alu repeats appear as monomers, dimers and trimers, harboring two to six nucleosomes in a run. Considering the exceptional role CG dinucleotides play in the nucleosome positioning, we hypothesize that Alu-nucleosomes, especially, those that form tightly positioned runs, could serve as "anchors" in organizing the chromatin in human cells.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号