首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

2.
A system for the computer analysis of nucleic acid and protein sequences ("Helix") is described. Format of the DNA sequences is EMBL--compatible and may be easily commented with the help of convenient menus. "Helix" has also following possibilities: an effective alignment of gele reading data and formation of the final sequence; simple making of recombined molecules "in calcular"; calculations of nucleotide and dinucleotide distribution along the sequence; looking for coding frames; calculations percentage of codons and amino acids in coding frames; searching for direct and inverted repeats; sequences alignment; protein secondary structure prediction; restriction mapping; DNA--protein translation. "Helix" also contain programs for RNA-structure prediction, looking for homologies throughover the EMAL bank, choosing optimal sequence for probes and searching promoters. All the programs are written at FORTRAN-77 and automatically translated into FORTRAN-4. "Helix" require only 64 kbite.  相似文献   

3.
The complete nucleotide sequence of a genomic clone encoding the mouse skeletal alpha-actin gene has been determined. This single-copy gene codes for a protein identical in primary sequence to the rabbit skeletal alpha-actin. It has a large intron in the 5'-untranslated region 12 nucleotides upstream from the initiator ATG and five small introns in the coding region at codons specifying amino acids 41/42, 150, 204, 267, and 327/328. These intron positions are identical to those for the corresponding genes of chickens and rats. Similar to other skeletal alpha-actin genes, the nucleotide sequence codes for two amino acids, Met-Cys, preceding the known N-terminal Asp of the mature protein. Comparison of the nucleotide sequences of rat, mouse, chicken, and human skeletal muscle alpha-actin genes reveals conserved sequences (some not previously noted) outside of the protein-coding region. Furthermore, several inverted repeat sequences, partially within these conserved regions, have been identified. These sequences are not present in the vertebrate cytoskeletal beta-actin genes. The strong conservation of the inverted repeat sequences suggests that they may have a role in the tissue-specific expression of skeletal alpha-actin genes.  相似文献   

4.
5.
The complete nucleotide sequence of the gene for chain c of hemoglobin of the earthworm Lumbricus terrestris has been determined. The sequence of 4037 base pairs (bp) includes about 310 bp of 5'-flanking sequence and 110 bp 3' to the poly(A) site. Comparison of cDNA and genomic sequences shows four silent differences in codons that suggest the presence of at least two genes. The coding sequence is split by two introns of 1344 and 1169 bp at highly conserved positions (Jhiang, S. M., Garey, J. R., and Riggs, A. F. (1988) Science 240, 334-336). The first intron possesses the unusual 5' splice junction sequence GC instead of GT. Many tandem triplet repeats based on (GAT) and (CCT) are present in the first intron. The second intron has nine tandem repeats based on the consensus sequence AAGGAAGGAGGTC. Each intron has several exact inverted repeats of 9-10 bp that might result in loops of 78-140 nucleotides in the RNA prior to splicing. The sequences in the second intron, at positions 2423-2644 are about 65% identical with parts of several genes found in yeast mitochondria and in DNA from several other organisms.  相似文献   

6.
7.
Direct repeats of the F plasmid incC region express F incompatibility   总被引:22,自引:0,他引:22  
A Tolun  D R Helinski 《Cell》1981,24(3):687-694
The nucleotide sequence of the incompatibility region incC, located at 45.8--46.4 kb on the F plasmid map, was determined. This region consists of 543 bp and contains sufficient information to code for only two small polypeptides of 34 and 30 amino acids each. Deletion of the ATG start codons for these two polypeptides has no effect on expression of incC incompatibility. A prominent feature of this sequence is the presence of five 22 bp direct repeats. A 58 bp segment of the incC region that contains two of these direct repeats was inserted into plasmid pACYC184, which is compatible with the F plasmid. The pACYC184 plasmid containing the direct-repeat sequences now expresses incompatibility with the F'lac plasmid and replication-proficient derivatives of the mini-F plasmid.  相似文献   

8.
Complete chromosome/genome sequences available from humans, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, and Saccharomyces cerevisiae were analyzed for the occurrence of mono-, di-, tri-, and tetranucleotide repeats. In all of the genomes studied, dinucleotide repeat stretches tended to be longer than other repeats. Additionally, tetranucleotide repeats in humans and trinucleotide repeats in Drosophila also seemed to be longer. Although the trends for different repeats are similar between different chromosomes within a genome, the density of repeats may vary between different chromosomes of the same species. The abundance or rarity of various di- and trinucleotide repeats in different genomes cannot be explained by nucleotide composition of a sequence or potential of repeated motifs to form alternative DNA structures. This suggests that in addition to nucleotide composition of repeat motifs, characteristic DNA replication/repair/recombination machinery might play an important role in the genesis of repeats. Moreover, analysis of complete genome coding DNA sequences of Drosophila, C. elegans, and yeast indicated that expansions of codon repeats corresponding to small hydrophilic amino acids are tolerated more, while strong selection pressures probably eliminate codon repeats encoding hydrophobic and basic amino acids. The locations and sequences of all of the repeat loci detected in genome sequences and coding DNA sequences are available at http://www.ncl-india.org/ssr and could be useful for further studies.  相似文献   

9.
10.
11.
Structure and evolution of the apolipoprotein multigene family   总被引:8,自引:0,他引:8  
We present the complementary DNA and deduced amino acid sequence of rat apolipoprotein A-II (apoA-II), and the results of a detailed statistical analysis of the nucleotide and amino acid sequences of all the apolipoprotein gene sequences published to date: namely, those of human and rat apoA-I, apoA-II and apoE, rat apoA-IV, and human apoC-I, C-II and C-III. Our results indicate that the apolipoprotein genes have very similar genomic structures, each having a total of three introns at the same locations. Using the exon/intron junctions as reference points, we have obtained an alignment of the coding regions of all the genes studied. It appears that the mature peptide regions of these genes are almost completely made up of tandem repeats of 11 codons. The part of mature peptide region encoded by exon 3 contains a common block of 33 codons, whereas the part encoded by exon 4 contains a much more variable number of internal repeats of 11 codons. These genes have apparently evolved from a primordial gene through multiple partial (internal) and complete gene duplications. On the basis of the degree of homology of the various sequences, and the pattern of the internal repeats in these genes, we propose an evolutionary tree for the apolipoprotein genes and give rough estimates of the divergence times between these genes. Our results show that apoA-II has evolved extremely rapidly and that apoA-I and apoE also have evolved at high rates but some regions are better conserved than the others. The rate of evolution of individual regions seems to be related to the stringency of their functional requirements.  相似文献   

12.
Paramecium tetraurelia, like some other ciliate species, uses an alternative nuclear genetic code where UAA and UAG are translated as glutamine and UGA is the only stop codon. It has been postulated that the use of stop codons as sense codons is dependent on the presence of specific tRNAs and on modification of eukaryotic release factor one (eRF1), a factor involved in stop codon recognition during translation termination. We describe here the isolation and characterisation of two genes, eRF1-a and eRF1 b, coding for eRF1 in P. tetraurelia. The two genes are very similar, both in genomic organization and in sequence, and might result from a recent duplication event. The two coding sequences are 1,314 nucleotides long, and encode two putative proteins of 437 amino acids with 98.5% identity. Interestingly, when compared with the eRF1 sequences either of ciliates having the same variant genetic code, or of other eukaryotes, the eRF1 of P. tetraurelia exhibits significant differences in the N-terminal region, which is thought to interact with stop codons. We discuss here the consequences of these changes in the light of recent models proposed to explain the mechanism of stop codon recognition in eukaryotes. Besides, analysis of the expression of the two genes by Northern blotting and primer extension reveals that these genes exhibit a differential expression during vegetative growth and autogamy.  相似文献   

13.
14.
15.
A Dictyostelium discoideum DNA fragment isolated on the basis of its ability to complement the ural mutation of yeast, codes for a dihydroorotate dehydrogenase activity. The complete nucleotide sequence of this 1898 bp fragment has been determined and reveals an open reading frame capable of coding for a 369 amino acid polypeptide of molecular mass 47.000. The gene shows preferential use of codons with weak pairing forces. Eleven codons, mainly those with a G in the third position, are absent. The flanking sequences are unusually rich in A + T (80%). Several direct and inverted repeats exist in the 5' flanking sequence.  相似文献   

16.
17.
Ubiquitin genes as a paradigm of concerted evolution of tandem repeats   总被引:8,自引:0,他引:8  
Summary Ubiquitin is remarkable for its ubiquitous distribution and its extreme protein sequence conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences of several ubiquitin repeats from each of humans, chicken,Xenopus, Drosophila, barley, and yeast have recently been determined. By analysis of these data we show that ubiquitin is evolving more slowly than any other known protein, and that this (together with its gene organization) contributes to an ideal situation for the occurrence of concerted evolution of tandem repeats. By contrast, there is little evidence of between-cluster concerted evolution. We deduce that in ubiquitin genes, concerted evolution involves both unequal crossover and gene conversion, and that the average time since two repeated units within the polyubiquitin locus most recently shared a common ancestor is approximately 38 million years (Myr) in mammals, but perhaps only 11 Myr inDrosophila. The extreme conservatism of ubiquitin evolution also allows the inference that certain synonymous serine codons differing at the first two positions were probably mutated at single steps.  相似文献   

18.
Nucleotide sequence of the glnA control region of Escherichia coli   总被引:10,自引:0,他引:10  
The RNA polymerase binding sites present along a DNA segment encompassing the glnA, glnL, and glnG genes have been identified in a hybrid plasmid carrying this chromosomal region of Escherichia coli. The DNA sequence was determined of an 817 base pair segment that contains the region coding for the first 42 amino acids of the NH2-terminal and of the glnA structural gene, as well as its regulatory region. Analysis of this nucleotide sequence revealed three probable RNA polymerase recognition sites, imperfect palindromes, inverted repeats, and direct repeated sequences.  相似文献   

19.
Filaggrins are an important class of intermediate filament-associated proteins that are involved in the organization of keratin filaments in the terminal stages of mammalian epidermal differentiation. Filaggrins are initially synthesized as very large polyprotein precursors consisting of many tandemly arranged repeats that are later liberated by proteolytic processes to yield many copies of the functional protein. We have recently characterized a cDNA clone to mouse filaggrin (Rothnagel, J. A., Mehrel. T., Idler, W. W., Roop, D. R., and Steinert, P. M. (1987) J. Biol. Chem. 262, 15643-15648) which encodes a 750-base pair (250-amino acid) repeating element having properties consistent with a filaggrin molecule. Southern blot analysis of total mouse DNA and the mouse gene isolated from a cosmid library (cosmid clone cFM6.1A2) has also revealed a repeat length of about 750 base pairs. The cosmid clone contains most of the mouse filaggrin gene, but it is missing the 5'-noncoding sequences and possibly some coding sequences as well. We report here that cosmid clone cFM6.1A2 contains 20 filaggrin repeats and 15,213 base pairs of coding sequences. Sequence analysis of this clone has revealed at least two different types of repeating element. Type B has a repeat length of 750 base pairs (250 amino acids), whereas type A is 765 base pairs (255 amino acids) long and contains an additional five amino acids inserted next to an acidic sequence that delineates the amino and carboxyl termini of the filaggrin repeats. It is supposed that these additional five amino acids may alter the proteolytic sensitivity of the acidic linker sequence, thereby affecting the processing of the precursor. The random distribution of the two types of repeats in the precursor indicates that the mouse filaggrin gene arose by a complicated series of duplications and/or rearrangements.  相似文献   

20.
Summary It has been shown that codons coding for strongly hydrophilic amino acids are complemented by codons that code for strongly hydrophobic ones, leading to a hypothesis stating that peptides thus encoded should interact. Though the principle has been validated in a number of experimental models, its general applicability has been questioned. I have discussed this principle, showing that the correlation between coding and noncoding strand amino acids was maintained, indeed slightly improved, when weighted averages based on codon usage tables were used to determine noncoding strand amino acid hydropathies. The coding capacity of the noncoding strand and its content of open reading frames were also discussed. Another point of contention that was afforded further clarification is the chemical plausibility of interactions between hydrophobic and hydrophilic amino acids implicit in this concept. The extension of complementary domains was also dealt with. Finally, I have discussed what I called the evolutionary drift of primary structure, and I showed as an example that though nucleotide sequences coding for the substance K receptor bear little resemblance to the inverse complement of that which codes for the SK peptide, a peptide spanning residues 130–139 is hydropathically very similar to that predicted from such an inverse complement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号