首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Tandemly arrayed genes (TAGs) account for about one-third of the duplicated genes in eukaryotic genomes. They provide raw genetic material for biological evolution, and play important roles in genome evolution. The 22-kDa prolamin genes in cereal genomes represent typical TAG organization, and provide the good material to investigate gene amplification of TAGs in closely related grass genomes. Here, we isolated and sequenced the Coix 22-kDa prolamin (coixin) gene cluster (283 kb), and carried out a comparative analysis with orthologous 22-kDa prolamin gene clusters from maize and sorghum. The 22-kDa prolamin gene clusters descended from orthologous ancestor genes, but underwent independent gene amplification paths after the separation of these species, therefore varied dramatically in sequence and organization. Our analysis indicated that the gene amplification model of 22-kDa prolamin gene clusters can be divided into three major stages. In the first stage, rare gene duplications occurred from the ancestor gene copy accidentally. In the second stage, rounds of gene amplification occurred by unequal crossing over to form tandem gene array(s). In the third stage, gene array was further diverged by other genomic activities, such as transposon insertions, segmental rearrangements, etc. Unlike their highly conserved sequences, the amplified 22-kDa prolamin genes diverged rapidly at their expression capacities and expression levels. Such processes had no apparent correlation to age or order of amplified genes within TAG cluster, suggesting a fast evolving nature of TAGs after gene amplification. These results provided insights into the amplification and evolution of TAG families in grasses.  相似文献   

3.
Recently, the complete chloroplast genome sequences of many important crop plants were determined, and this can be considered a major step forward toward exploiting the usefulness of chloroplast genetic engineering technology. Economically, cotton is one of the most important crop plants for many countries. To further our understanding of this important crop, we determined the complete nucleotide sequence of the chloroplast genome from cotton (Gossypium barbadense L.). The chloroplast genome of cotton is 160,317 base pairs (bp) in length, and is composed of a large single copy (LSC) of 88,841 bp, a small single copy (SSC) of 20,294 bp, and two identical inverted repeat (IR) regions of 25,591 bp each. The genome contains 114 unique genes, of which 17 genes are duplicated in the IRs. In addition, many open reading frames (ORFs) and hypothetical chloroplast reading frames (ycfs) with unknown functions were deduced. Compared to the chloroplast genomes from 8 other dicot plants, the cotton chloroplast genome showed a high degree of similarity of the overall structure, gene organization, and gene content. Furthermore, the sequences of the genes showed high degrees of identity at the DNA and amino acid levels. The cotton chloroplast genome was somewhat longer than the chloroplast genomes of most of the other dicot plants compared here. However, this elongation of the cotton chloroplast genome was found to be due mainly to expansions of the intergenic regions and introns (non-coding DNA). Moreover, these expansions occurred predominantly in the LSC and SSC regions.  相似文献   

4.
Song R  Messing J 《Plant physiology》2002,130(4):1626-1635
A new approach has been undertaken to analyze the sequences and linear organization of the 19-kD zein genes in maize (Zea mays). A high-coverage, large-insert genomic library of the inbred line B73 based on bacterial artificial chromosomes was used to isolate a redundant set of clones containing members of the 19-kD zein gene family, which previously had been estimated to consist of 50 members. The redundant set of clones was used to create bins of overlapping clones that represented five distinct genomic regions. Representative clones containing the entire set of 19-kD zein genes were chosen from each region and sequenced. Seven bacterial artificial chromosome clones yielded 1,160 kb of genomic DNA. Three of them formed a contiguous sequence of 478 kb, the longest contiguous sequenced region of the maize genome. Altogether, these DNA sequences provide the linear organization of 25 19-kD zein genes, one-half the number previously estimated. It is suggested that the difference is because of haplotypes exhibiting different degrees of gene amplification in the zein multigene family. About one-half the genes present in B73 appear to be expressed. Because some active genes have only been duplicated recently, they are so conserved in their sequence that previous cDNA sequence analysis resulted in "unigenes" that were actually derived from different gene copies. This analysis also shows that the 22- and 19-kD zein gene families shared a common ancestor. Although both ancestral genes had the same incremental gene amplification, the 19-kD zein branch exhibited a greater degree of far-distance gene translocations than the 22-kD zein gene family.  相似文献   

5.
6.
We have determined the complete nucleotide sequence of an infectious cloned genome of ground squirrel hepatitis virus (GSHV), a nonpathogenic member of the hepadnavirus group. The genome is 3,311 base pairs long and contains the major open reading frames described for the related human and woodchuck hepatitis B viruses (HBV and WHV, respectively). These reading frames include genes for the major structural proteins (the surface and core antigens), unassigned open reading frames (A and B), the longer of which is presumed to encode the viral DNA polymerase, and an open reading frame preceding and continuous with the surface antigen gene. The arrangement of these open reading frames is similar to that encountered in the genomes of HBV and WHV: all of the reading frames are encoded on the same strand, they are positioned in the same fashion with respect to each other, and a large portion (at least 51%) of the genome can be translated in two reading frames. Comparisons of the predicted translational products of the three mammalian hepadnaviruses reveal 78% amino acid homology between the proteins of GSHV and WHV and 43% homology between those of GSHV and HBV. In addition, a perfect direct repeat of 10 to 11 base pairs, separated by ca. 46 to 223 base pairs, is present in the three mammalian viruses and in duck hepatitis B virus; the position of the repeats near the 5' termini of the two strands of virion DNA suggests a role in viral replication.  相似文献   

7.
The nucleotide sequence of a 9937 base-pair portion of human chromosome 9, which contains two complete leukocyte interferon genes (LeIF-L and J), the complete intergenic region, and part of a third related possible pseudogene (LeIF-M), has been determined. The coding regions of the L and J genes are separated by 4363 nucleotides. The coding regions for the putative L and J interferons are 96% homologous and are each surrounded by about 3500 nucleotides of flanking sequences, which are also highly homologous. The L and J genes and their respective flanking sequences comprise a 4000 nucleotide leukocyte interferon gene repeat unit; the L gene repeat unit contains two major insertions not present in the J gene repeat unit. The J gene repeat unit is flanked by sequence features reminiscent of those found surrounding transposable elements. Both the L and J gene repeat units are embedded within sequences that are highly repeated in the human genome. Structural features identified within this portion of chromosome 9 may have been important for the generation of this interferon gene cluster.  相似文献   

8.
Nucleotide sequence and genome organization of canine parvovirus.   总被引:30,自引:13,他引:17       下载免费PDF全文
The genome of a canine parvovirus isolate strain (CPV-N) was cloned, and the DNA sequence was determined. The entire genome, including ends, was 5,323 nucleotides in length. The terminal repeat at the 3' end of the genome shared similar structural characteristics but limited homology with the rodent parvoviruses. The 5' terminal repeat was not detected in any of the clones. Instead, a region of DNA starting near the capsid gene stop codon and extending 248 base pairs into the coding region had been duplicated and inserted 75 base pairs downstream from the poly(A) addition site. Consensus sequences for the 5' donor and 3' acceptor sites as well as promotors and poly(A) addition sites were identified and compared with the available information on related parvoviruses. The genomic organization of CPV-N is similar to that of feline parvovirus (FPV) in that there are two major open reading frames (668 and 722 amino acids) in the plus strand (mRNA polarity). Both coding domains are in the same frame, and no significant open reading frames were apparent in any of the other frames of both minus and plus DNA strands. The nucleotide and amino acid homologies of the capsid genes between CPV-N and FPV were 98 and 99%, respectively. In contrast, the nucleotide and amino acid homologies of the capsid genes for CPV-N and CPV-b (S. Rhode III, J. Virol. 54:630-633, 1985) were 95 and 98%, respectively. These results indicate that very few nucleotide or amino acid changes differentiate the antigenic and host range specificity of FPV and CPV.  相似文献   

9.
10.
S T Hu  L C Lee    G S Lei 《Journal of bacteriology》1996,178(19):5652-5659
The genome of the transposable element IS2 contains five open reading frames that are capable of encoding proteins greater than 50 amino acids; however, only one IS2 protein of 14 kDa had been detected. By replacing the major IS2 promoter located in the right terminal repeat of IS2 with the T7 promoter to express IS2 genes, we have detected another IS2 protein of 46 kDa. This 46-kDa protein was designated InsAB'. Analyses of the InsAB' sequence revealed motifs that are characteristic of transposases of other transposable elements. InsAB' has the ability to bind both terminal repeat sequences of IS2. It was shown to bind a 27-bp sequence (5'-GTTAAGTGATAACAGATGTCTGGAAAT-3', positions 1316 to 1290 by our numbering system [16 to 42 by the previous numbering system]) located at the inner end of the right terminal repeat and a 31-bp sequence (5'-TTATTTAAGTGATATTGGTTGTCTGGAGATT-3', positions 46 to 16 [1286 to 1316]), including the last 27 bp of the inner end and the adjacent 4 bp of the left terminal repeat of IS2. This result suggests that InsAB' is a transposase of IS2. Since there is no open reading frame capable of encoding a 46-kDa protein in the entire IS2 genome, this 46-kDa protein is probably produced by a translational frameshifting mechanism.  相似文献   

11.
12.
ABSTRACT. A fragment from the genome of rat-derived Pneumocystis carinii was found to contain two MSG genes arranged as a direct repeat. The sequences from one gene (MSG B), the region between the two genes, and part of the second gene (MSG A) were determined. The two MSG genes were not identical in sequence. The open reading frames of MSG A and MSG B encode non-identical proteins, both of which are similar to that encoded by a previously published cDNA. The MSG B gene sequence showed no evidence of introns. The 5'and 3'untranslated regions of the MSG gene pair were highly conserved, but the regions immediately upstream of the open reading frames of MSG A and B were different from the region upstream of a previously characterized MSG cDNA. Primers designed to extend upstream of the 5'end of MSG and downstream of the 3'end of MSG were used in a polymerase chain reaction with total genomic P. carinii DNA as template. Presumptive intergenic amplification products from this reaction were cloned and sequenced. The sequences of these regions were similar but distinct, indicating that tandem arrangement of MSG genes is a common organizational motif.  相似文献   

13.
The nucleotide sequence of the entire beta-like globin gene cluster of rabbits has been determined. This sequence of a continuous stretch of 44.5 x 10(3) base-pairs (bp) starts about 6 x 10(3) bp upstream from epsilon (the 5'-most gene) and ends about 12 x 10(3) bp downstream from beta (the 3'-most gene). Analysis of the sequence reveals that: (1) the sequence is relatively A + T rich (about 60%); (2) regions with high G + C content are associated with OcC repeats, a short interspersed repeated DNA in rabbits; (3) the distribution of polypurines, polypyrimidines and alternating purine/pyrimidine tracts is not random within the cluster; (4) most open reading frames are associated with known globin coding regions, OcC repeats or long interspersed repeats (L1 repeats); (5) the most prominent open reading frames are found in the L1 repeats; (6) different strand asymmetries in base composition are associated with embyronic and adult genes as well as the tandem L1 repeats at the 3' end of the cluster; and (7) essentially all the repeats appear to have been inserted by a transposon mechanism. A comparison of the sequence with itself by a dot-plot analysis has revealed nine new members of the OcC family of repeats in addition to the six previously reported. The OcC repeats tend to be clustered, particularly in the epsilon-gamma and gamma-psi delta intergenic regions. Dot-plot comparisons between the rabbit and the human clusters have revealed extensive sequence matches. Homology starts about 6 x 10(3) bp 5' to epsilon or as far upstream as the rabbit sequence is available. It continues throughout the entire cluster and stops about 0.7 x 10(3) bp 3' to beta, at which point several repeats have inserted in both rabbits and humans. Throughout the gene cluster, the homology is interrupted mainly by insertions or deletions in either the rabbit or the human genome. Almost all of the insertions are of known short or long repeated DNAs. The positions of the insertions are different in the two gene clusters, which indicates that both short and long repeats have been transposing throughout the genome for the time since the mammalian radiation. An alignment of rabbit and human sequences allows the calculation of the substitution rate around epsilon. Sequences far removed from the gene are evolving at a rate equivalent to the pseudogene rate, although some short regions show an apparently higher rate.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

14.
A cluster of genes encoding subunits of ATP synthase of Anabaena sp. strain PCC 7120 was cloned, and the nucleotide sequences of the genes were determined. This cluster, denoted atp1, consists of four F0 genes and three F1 genes encoding the subunits a (atpI), c (atpH), b' (atpG), b (atpF), delta (atpD), alpha (aptA), and gamma (atpC) in that order. Closely linked upstream of the ATP synthase subunit genes is an open reading frame denoted gene 1, which is equivalent to the uncI gene of Escherichia coli. The atp1 gene cluster is at least 10 kilobase pairs distant in the genome from apt2, a cluster of genes encoding the beta (atpB) and epsilon (atpE) subunits of the ATP synthase. This two-clustered ATP synthase gene arrangement is intermediate between those found in chloroplasts and E. coli. A unique feature of the Anabaena atp1 cluster is overlap between the coding regions for atpF and atpD. The atp1 cluster is transcribed as a single 7-kilobase polycistronic mRNA that initiates 140 base pairs upstream of gene 1. The deduced translation products for the Anabaena sp. strain PCC 7120 subunit genes are more similar to chloroplast ATP synthase subunits than to those of E. coli.  相似文献   

15.
Wang F  Wang J  Jian H  Zhang B  Li S  Wang F  Zeng X  Gao L  Bartlett DH  Yu J  Hu S  Xiao X 《PloS one》2008,3(4):e1937
Shewanella species are widespread in various environments. Here, the genome sequence of Shewanella piezotolerans WP3, a piezotolerant and psychrotolerant iron reducing bacterium from deep-sea sediment was determined with related functional analysis to study its environmental adaptation mechanisms. The genome of WP3 consists of 5,396,476 base pairs (bp) with 4,944 open reading frames (ORFs). It possesses numerous genes or gene clusters which help it to cope with extreme living conditions such as genes for two sets of flagellum systems, structural RNA modification, eicosapentaenoic acid (EPA) biosynthesis and osmolyte transport and synthesis. And WP3 contains 55 open reading frames encoding putative c-type cytochromes which are substantial to its wide environmental adaptation ability. The mtr-omc gene cluster involved in the insoluble metal reduction in the Shewanella genus was identified and compared. The two sets of flagellum systems were found to be differentially regulated under low temperature and high pressure; the lateral flagellum system was found essential for its motility and living at low temperature.  相似文献   

16.
Small repeat sequences in bacterial genomes, which represent non-autonomous mobile elements, have close similarities to archaeon and eukaryotic miniature inverted repeat transposable elements. These repeat elements are found in both intergenic and intragenic chromosomal regions, and contain an array of diverse motifs. These can include DNA sequences containing an integration host factor binding site and a proposed DNA methyltransferase recognition site, transcribed RNA secondary structural motifs, which are involved in mRNA regulation, and translated open reading frames found fused to other open reading frames. Some bacterial mobile element fusions are in evolutionarily conserved protein and RNA genes. Others might represent or lead to creation of new protein genes. Here we review the remarkable properties of these small bacterial mobile elements in the context of possible beneficial roles resulting from random insertions into the genome.  相似文献   

17.
The structural gene of the Paracoccus denitrificans NADH-ubiquinone oxidoreductase encoding a homologue of the 75-kDa subunit of bovine complex I (NQO3) has been located and sequenced. It is located approximately 1 kbp downstream of the gene coding for the NADH-binding subunit (NQO1) [Xu, X., Matsuno-Yagi, A., and Yagi, T. (1991) Biochemistry 30, 6422-6428] and is composed of 2019 base pairs and codes for 673 amino acid residues with a calculated molecular weight of 73,159. The M(r) 66,000 polypeptide of the isolated Paracoccus NADH dehydrogenase complex is assigned the NQO3 designation on the basis of N-terminal protein sequence analysis, amino acid analysis, and immuno-cross-reactivity. The encoded protein contains a putative tetranuclear iron-sulfur cluster (probably cluster N4) and possibly a binuclear iron-sulfur cluster. An unidentified reading frame (URF3) which is composed of 396 base pairs and possibly codes for 132 amino acid residues was found between the NQO1 and NQO3 genes. When partial DNA sequencing of the regions downstream of the NQO3 gene was performed, sequences homologous to the mitochondrial ND-1, ND-5, and ND-2 gene products of bovine complex I were found, suggesting that the gene cluster carrying the Paracoccus NADH dehydrogenase complex contains not only structural genes encoding water-soluble subunits but also structural genes encoding hydrophobic subunits.  相似文献   

18.
We have cloned and sequenced a 5200 base restriction fragment and an overlapping 3100 base fragment of the large single copy region of the chloroplast genome of the diatom Odontella sinensis, which hybridized to several ATPase gene probes. These fragments contain six closely linked reading frames that were identified as atpI, atpH, atpG, atpF, atpD, and atpA, coding for subunits IV, III, II, I, delta, and alpha, respectively. Remarkably, the genes atpG and atpD, which are nucleus-encoded in chlorophyll a + b plants, are present in the Odontella chloroplast gene cluster. They map at the same positions as in cyanobacteria. The genes atpD and atpF overlap by four base-pairs as in certain photosynthetic and heterotrophic eubacteria. Upstream from the atpA gene cluster an open reading frame coding for 251 amino acid residues was found, which shows sequence similarity to ATP-binding subunits of periplasmic prokaryotic and eukaryotic transport systems. No similar reading frame is present in the land plant chloroplast genomes analysed so far. Sequences and arrangement of the genes are discussed with respect to the peculiar evolution of the chlorophyll a + c-containing chromophytic plastids.  相似文献   

19.
The nucleotide sequence of Korean ginseng (Panax schinseng Nees) chloroplast genome has been completed (AY582139). The circular double-stranded DNA, which consists of 156,318 bp, contains a pair of inverted repeat regions (IRa and IRb) with 26,071 bp each, which are separated by small and large single copy regions of 86,106 bp and 18,070 bp, respectively. The inverted repeat region is further extended into a large single copy region which includes the 5' parts of the rpsl9 gene. Four short inversions associated with short palindromic sequences that form stem-loop structures were also observed in the chloroplast genome of P. schinseng compared to that of Nicotiana tabacum. The genome content and the relative positions of 114 genes (75 peptide-encoding genes, 30 tRNA genes, 4 rRNA genes, and 5 conserved open reading frames [ycfs]), however, are identical with the chloroplast DNA of N. tabacum. Sixteen genes contain one intron while two genes have two introns. Of these introns, only one (trnL-UAA) belongs to the self-splicing group I; all remaining introns have the characteristics of six domains belonging to group II. Eighteen simple sequence repeats have been identified from the chloroplast genome of Korean ginseng. Several of these SSR loci show infra-specific variations. A detailed comparison of 17 known completed chloroplast genomes from the vascular plants allowed the identification of evolutionary modes of coding segments and intron sequences, as well as the evaluation of the phylogenetic utilities of chloroplast genes. Furthermore, through the detailed comparisons of several chloroplast genomes, evolutionary hotspots predominated by the inversion end points, indel mutation events, and high frequencies of base substitutions were identified. Large-sized indels were often associated with direct repeats at the end of the sequences facilitating intra-molecular recombination.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号