首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The general property of asymmetry in word use in meaningful texts written in a variety of languages, motivates a quantification of the differences in the use of mutually symmetric triplets in genomic sequences. When this is done in the three reading frames, high values found for one of them are used as indication that the sequence is coding for a protein. Moreover, a similar quantification of the differences in the use of complementary triplets is introduced, again with predictive power of the coding character of a sequence. This method reflects the non-equivalence between sense and anti-sense strand of a coding segment. In both approaches, "linguistic asymmetry" in coding sequences is related to the form of the genetic code and to the bias in codon usage and amino acid use skews.  相似文献   

2.
3.
D W Chung  E W Davie 《Biochemistry》1984,23(18):4232-4236
cDNAs and the genomic DNA coding for the gamma and gamma' chains of human fibrinogen have been isolated and characterized by sequence analysis. The cDNAs coding for the gamma and gamma' chains share a common nucleotide sequence coding for the first 407 amino acid residues in each polypeptide chain. The predominant gamma chain contains an additional four amino acids on its carboxyl-terminal end (residues 408-411). These four amino acids, together with the 3' noncoding sequences, are encoded by the tenth exon. Removal of the ninth intervening sequence following the processing and polyadenylation reactions yields a mature mRNA coding for the predominant gamma chain. The less prevalent gamma' chain contains 20 amino acids at its carboxyl-terminal end (residues 408-417). These 20 amino acids are encoded by the immediate 5' end of the ninth intervening sequence. This results from an occasional processing and polyadenylation reaction that occurs within the region normally constituting the ninth intervening sequence. Accordingly, the gene for the gamma chain of human fibrinogen gives rise to two mRNAs that differ in sequence on their 3' ends. These mRNAs code for polypeptide chains with different carboxyl-terminal sequences. Both of these polypeptides are incorporated into the fibrinogen molecule present in plasma.  相似文献   

4.
5.
Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule.  相似文献   

6.
A reverse-phase HPLC System for isolation of the water insoluble alpha- and beta-polypeptides of the light-harvesting complex II (LH II) of Rhodopseudomonas (Rps.) palustris without employment of any detergent was developed. The material obtained was of high purity and suitable for direct microsequence analysis. Chromatographic analysis could resolve at least two major beta-polypeptides, beta a and beta b, two major alpha-polypeptides, alpha a and alpha b, and two additional minor polypeptides. N-terminal amino acid sequencing shows that the resolved peaks correspond to different polypeptide species and that the minor species have an N-terminal sequence identical to that of the alpha b polypeptide. An oligonucleotide derived from the amino terminal sequence of the alpha a polypeptide was utilized to screen a genomic library from Rps.palustris. Several independent clones have been characterized by Southern blot and nucleotide sequence analysis. We show that Rps.palustris contains at least four different clusters of beta and alpha genes. Two clones contain sequences potentially coding for beta a-alpha a and beta b-alpha b polypeptides; and two additional clones potentially coding for beta and alpha peptides which we named beta c-alpha c and beta d-alpha d, which did not correspond to the major purified polypeptides. In addition to the protein chemistry data, the conservation at the amino acid level and the presence of canonical ribosomal binding sites upstream of each of the identified genes strongly suggest that all four coding regions are expressed.  相似文献   

7.
Prostatic steroid binding protein: organisation of C1 and C2 genes.   总被引:5,自引:5,他引:0       下载免费PDF全文
M Parker  M Needham  R White  H Hurst    M Page 《Nucleic acids research》1982,10(17):5121-5132
Prostatic steroid binding protein, whose expression is stimulated by androgens, consists of two subunits: one containing the polypeptides C1 and C3 and the other containing C2 and C3. We have characterised genomic clones containing the C1 and C2 genes by restriction enzyme analysis and DNA sequencing. Both genes are 3.2 Kb, have similar exon/intron arrangements and share considerable DNA sequence homologies in their coding regions, intervening sequences and 5' upstream DNA sequence which suggests that they have probably arisen from the duplication of an ancestral gene. The 5' termini of C1 and C2 mRNA have been mapped; the sequence TATAAA appears 30 nucleotides upstream but a CAAT-like sequence at -60 - -80 is absent. Finally, homologous human genes have not been detected.  相似文献   

8.
9.
A cDNA coding for the non-histone chromosomal protein HMG-I, or its isoform HMG-Y, was isolated from a murine Friend cell library using synthetic oligonucleotide hybridization probes. Sequence analysis showed that the 1670-base pair full length cDNA insert consists of a 201-base pair, G/C-rich (74%), 5'-untranslated region, a 288-base pair amino acid coding sequence, and an unusually long 1182-base pair 3'-untranslated region. The deduced 96-residue amino acid coding sequence of the murine HMG-I(Y) cDNA is very similar to the reported amino acid sequence of human HMG-I, except that it lacks 11 internal amino acids reported in the human protein. Based on Southern blot hybridization analysis of genomic DNA, there appear to be fewer than five copies of HMG-I(Y) genes in the haploid murine genome. These murine HMG-I(Y) genes contain a large (at least 890 base pairs) exon that includes most, or all, of the 3'-untranslated region; whereas the much shorter 5'-untranslated region and amino acid coding sequences are interrupted by at least one intron. A single size class (approximately 1700 nucleotides in murine cells and 2000 nucleotides in human cells) of HMG-I(Y) mRNAs was detected at high levels in total RNA extracts from rapidly dividing, transformed cells, but to a lesser extent, or not at all, in extracts from slowly or non-dividing cells.  相似文献   

10.
Ten new wheat γ-gliadin gene sequences are reported and an analysis of γ-gliadin gene family structure is carried out using all known γ-gliadin sequences. The new sequences comprise four genomic clones with significantly more flanking DNA than previously reported, and six cDNA clones from a wheat endosperm EST project. Analysis of extended flanking DNA from the genomic clones indicates the limits of conservation of γ-gliadin DNA sequence that are similar to those previously found with other gliadin and glutenin genes and that are theorized to define the DNA sequence necessary for gene control. Most of the flanking DNA is not homologous to any reported DNA sequence, and one flanking region contains the first MITE-like (miniature inverted transposable element) DNA sequence associated with gliadin genes. About a quarter of the encoded polypeptides would contain a free cysteine residue – an observation that may relate to reports that at least some gliadins can participate in wheat endosperm glutenin polymer formation. The new sequences represent both genes closely related to those previously reported and a new sub-class of γ-gliadins.  相似文献   

11.
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.  相似文献   

12.
13.
Ubiquitin coding sequences were isolated from a human genomic library and two cDNA libraries. One human ubiquitin gene consists of 2055 nucleotides and codes for a polyprotein consisting of 685 amino acid residues. The polyprotein contains nine direct repeats of the ubiquitin amino acid sequence and the last ubiquitin sequence is extended with an additional valyl residue at the C-terminal end. No spacer sequences separate the ubiquitin repeats and the coding regions are not interrupted by intervening sequences. This particular gene is transcribed since cDNAs corresponding to the genomic sequence have been isolated. At least two more types of ubiquitin genes are encoded in the human genome, one coding for an ubiquitin monomer while another presumably codes for three or four direct repeats of the ubiquitin sequence. Human DNA contains many copies of the ubiquitin sequence. Ubiquitin is therefore encoded in the human genome as a multigene family.  相似文献   

14.
15.
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.  相似文献   

16.
We have cloned and expressed in Escherichia coli a gene encoding a 15,000-apparent-molecular-weight peptidoglycan-associated outer membrane lipoprotein (PAL) of Haemophilus influenzae. The nucleotide sequence of this gene encodes an open reading frame of 153 codons with a predicted mature protein of 134 amino acids. The amino acid composition and sequence of the predicted mature protein agree with the chemically determined composition and partial amino acid sequence of PAL purified from H. influenzae outer membranes. We have also identified a second gene from H. influenzae that encodes a second 15,000-apparent-molecular-weight protein which is recognized by antiserum against PAL. This protein has been shown to be a lipoprotein. The nucleotide sequence of this gene encodes an open reading frame of 154 codons with a predicted mature protein of 136 amino acids and has limited sequence homology with that of the gene encoding PAL. Southern hybridization analysis indicates that both genes exist as single copies in H. influenzae chromosomal DNA. Both genes encode polypeptides which have amino-terminal sequences similar to those of reported membrane signal peptides and are associated primarily with the outer membrane when expressed in E. coli.  相似文献   

17.
18.
19.
In this study, we collected and analyzed DNA sequence data for 789 previously mapped RFLP probes from Sorghum bicolor (L.) Moench. DNA sequences, comprising 894 non-redundant contigs and end sequences, were searched against three GenBank databases, nucleotide (nt), protein (nr) and EST (dbEST), using BLAST algorithms. Matching ESTs were also searched against nt and nr. Translated DNA sequences were then searched against the conserved domain database (CDD) to determine if functional domains/motifs were congruent with the proteins identified in previous searches. More than half (500/894 or 56%) of the query sequences had significant matches in at least one of the GenBank searches. Overall, proteins identified for 148 sequences (17%) were consistent among all searches, of which 66 sequences (7%) contained congruent coding domains. The RFLP probe sequences were also evaluated for the presence of simple sequence repeats (SSRs) and 60 SSRs were developed and assayed in an array of sorghum germplasm comprising inbreds, landraces and wild relatives. Overall, these SSR loci had lower levels of polymorphism ( D = 0.46, averaged over 51 polymorphic loci) compared with sorghum SSRs that were isolated by library hybridization screens ( D = 0.69, averaged over 38 polymorphic loci). This result was probably due to the relatively small proportion of di-nucleotide repeat-containing markers (42% of the total SSR loci) obtained from the DNA sequence data. These di-nucleotide markers also contained shorter repeat motifs than those isolated from genomic libraries. Based on BLAST results, 24 SSRs (40%) were located within, or near, previously annotated or hypothetical genes. We determined the location of 19 of these SSRs relative to putative coding regions. In general, SSRs located in coding regions were less polymorphic ( D = 0.07, averaged over three loci) than those from gene flanking regions, UTRs and introns ( D = 0.49, averaged over 16 loci). The sequence information and SSR loci generated through this study will be valuable for application to sorghum genetics and improvement, including gene discovery, marker-assisted selection, diversity and pedigree analyses, comparative mapping and evolutionary genetic studies.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号