首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually.  相似文献   

2.
The present paper describes AMmtDB, a database collecting the multi-aligned sequences of vertebrate mitochondrial genes coding for proteins and tRNAs, as well as the multiple alignment of the mammalian mtDNA main regulatory region (D-loop) sequences. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. As far as the genes coding for tRNAs are concerned, the multi-alignments based on the primary and the secondary structures are both provided; for the mammalian D-loop multi-alignments we report the conserved regions of the entire D-loop (CSB1, CSB2, CSB3, the central region, ETAS1 and ETAS2) as defined by Sbisà et al. [ Gene (1997), 205, 125-140). A flatfile format for AMmtDB has been designed allowing its implementation in SRS (http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB ). Data selected through SRS can be managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALV and PILEUP programs and then carefully optimized manually.  相似文献   

3.
The design of synthetic genes   总被引:1,自引:1,他引:0       下载免费PDF全文
Computer programs are described that aid in the design of synthetic genes coding for proteins that are targets of a research program in site directed mutagenesis. These programs "reverse-translate" protein sequences into general nucleic acid sequences (those where codons have not yet been selected), map restriction sites into general DNA sequences, identify points in the synthetic gene where unique restriction sites can be introduced, and assist in the design of genes coding for hybrids and evolutionary intermediates between homologous proteins. Application of these programs therefore facilitates the use of modular mutagenesis to create variants of proteins, and the implementation of evolutionary guidance as a strategy for selecting mutants.  相似文献   

4.
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.  相似文献   

5.
6.
Targeting of nucleus-encoded proteins into chloroplasts is mediated by N-terminal presequences. During evolution of plastids from formerly free-living cyanobacteria by endocytobiosis, genes for most plastid proteins have been transferred from the plastid genome to the nucleus and subsequently had to be equipped with such plastid targeting sequences. So far it is unclear how the gene domains coding for presequences and the respective mature proteins may have been assembled. While land plant plastids are supposed to originate from a primary endocytobiosis event (a prokaryotic cyanobacterium was taken up by a eukaryotic cell), organisms with secondary plastids like diatoms experienced a second endocytobiosis step involving a eukaryotic alga taken up by a eukaryotic host cell. In this group of algae, apparently most genes encoding chloroplast proteins have been transferred a second time (from the nucleus of the endosymbiont to the nucleus of the secondary host) and thus must have been equipped with additional targeting signals. We have analyzed cDNAs and the respective genomic DNA fragments of seven plastid preproteins from the diatom Phaeodactylum tricornutum. In all of these genes we found single spliceosomal introns, generally located within the region coding for the N-terminal plastid targeting sequences or shortly downstream of it. The positions of the introns can be related to the putative phylogenetic histories of the respective genes, indicating that the bipartite targeting sequences in these secondary algae might have evolved by recombination events via introns.The nucleotide sequences have been deposited at Genbank under accession numbers AY191862, AY191863, AY191864, AY191865, AY191866, AY191867, and AY191868.  相似文献   

7.
The diversity of connexin genes encoding gap junctional proteins.   总被引:24,自引:0,他引:24  
The multigene family of connexins is larger than previously anticipated. Ten different connexin homologous sequences have been characterized in the mouse genome, five of which are probably the mouse analogues of the known rat connexins26, -31, -32, -43, and -46. Since the additional 5 sequences have been isolated as cDNAs or hybridize specifically to distinct mRNA species, they most likely represent functional connexin genes. Since seven of the genomic connexin sequences have been shown to contain no intron in the coding sequence, this may apply to all mammalian connexin genes. Some of the structural features based on amino acid sequences deduced from cDNA or genomic sequences and the RNA expression pattern of the new connexins are compared with previously described connexins. The structural diversity of the connexin genes suggests that they fulfill different functions coordinated with, and perhaps required for, different programs of cellular differentiation.  相似文献   

8.
Napin is a 2S storage protein found in the seeds of oilseed rape (Brassica napus L.) and related species. Using protein structural prediction programs we have identified a region in the napin protein sequence which forms a `hydrophilic loop' composed of amino acid residues located at the protein surface. Targeting this region, we have constructed two napin chimeric genes containing the coding sequence for the peptide hormone leucine-enkephalin as a topological marker. One version has a single enkephalin sequence of 11 amino acids including linkers and the second contains a tandem repeat of this peptide comprising 22 amino acids, inserted into the napin large subunit. The inserted peptide sequences alter the balance of hydrophilic to hydrophobic amino acids and introduce flexibility into this region of the polypeptide chain. The chimeric genes have been expressed in tobacco plants under the control of the seed-specific napA gene promoter. Analyses indicate that the engineered napin proteins are expressed, transported, post-translationally modified and deposited inside the protein bodies of the transgenic seeds demonstrating that the altered napin proteins behave in a similar fashion to the authentic napin protein. Detailed immunolocalisation studies indicate that the insertion of the peptide sequences has a significant effect on the distribution of the napin proteins within the tobacco seed protein bodies.  相似文献   

9.
The genetic events that produce diversity in class I MHC genes and proteins has been investigated by using a family of closely related HLA-A alleles. Five genes coding for HLA-A2.2Y, HLA-A2.3, and HLA-Aw68.2 have been isolated. Exon sequences are compared with the known sequences for HLA-A2.1, HLA-A2.2F, HLA-A2.4, HLA-Aw68.1, and HLA-Aw69. Pairwise comparison of the eight unique sequences shows that point mutation, reciprocal recombination, and gene conversion have all contributed significantly to the diversification of this family of alleles. These results are compared with those of other studies that have emphasized the role of gene conversion. A predominance of coding substitutions in the alpha 1 and alpha 2 domains is found, consistent with positive selection for polymorphism being a major factor in the fixation of these alleles. In the three cases examined, genes for phenotypically identical proteins gave identical nucleotide sequences, indicating that most, if not all, of the class I polymorphism is detectable by immunological methods. The apparent stability of the sequences suggests that the events generating some of the alleles occurred before the origin of modern Homo sapiens.  相似文献   

10.
Many clostridial proteins are poorly produced in Escherichia coli. It has been suggested that this phenomena is due to the fact that several types of codons common in clostridial coding sequences are rarely used in E. coli and the quantities of the corresponding tRNAs in E. coli are not sufficient to ensure efficient translation of the corresponding clostridial sequences. To address this issue, we amplified three E. coli genes, ileX, argU, and leuW, in E. coli; these genes encode tRNAs that are rarely used in E. coli (the tRNAs for the ATA, AGA, and CTA codons, respectively). Our data demonstrate that amplification of ileX dramatically increased the level of production of most of the clostridial proteins tested, while amplification of argU had a moderate effect and amplification of leuW had no effect. Thus, amplification of certain tRNA genes for rare codons in E. coli improves the expression of clostridial genes in E. coli, while amplification of other tRNAs for rare codons might not be needed for improved expression. We also show that amplification of a particular tRNA gene might have different effects on the level of protein production depending on the prevalence and relative positions of the corresponding codons in the coding sequence. Finally, we describe a novel approach for improving expression of recombinant clostridial proteins that are usually expressed at a very low level in E. coli.  相似文献   

11.
12.
Codon usage and base composition in sequences from the A + T-rich genome ofRickettsia prowazekii, a member of the alpha Proteobacteria, have been investigated. Synonymous codon usage patterns are roughly similar among genes, even though the data set includes genes expected to be expressed at very different levels, indicating that translational selection has been ineffective in this species. However, multivariate statistical analysis differentiates genes according to their G + C contents at the first two codon positions. To study this variation, we have compared the amino acid composition patterns of 21R. prowazekii proteins with that of a homologous set of proteins fromEscherichia coli. The analysis shows that individual genes have been affected by biased mutation rates to very different extents: genes encoding proteins highly conserved among other species being the least affected. Overall, protein coding and intergenic spacer regions have G + C content values of 32.5% and 21.4%, respectively. Extrapolation from these values suggests thatR. prowazekii has around 800 genes and that 60–70% of the genome may be coding. Correspondence to: S.G.E. Andersson  相似文献   

13.
14.
Gene prediction relies on the identification of characteristic features of coding sequences that distinguish them from non-coding DNA. The recent large-scale sequencing of entire genomes from higher eukaryotes, in conjunction with currently used gene prediction algorithms, has provided an abundance of putative genes that can now be analysed for their compositional properties. Strong, systematic differences still exist, in several species, between the compositional properties of sets of ex novo predicted genes and genes that have been experimentally detected and/or verified. This is particularly evident in the estimated gene set (>45,000 genes) of the recently sequenced rice genome, where roughly half the predicted genes are compositionally unusual and have no known orthologues in the dicot Arabidopsis. In a few cases such differences might suggest a bias in experimental gene-finding protocols, but the quasi-random nature of the compositionally aberrant predicted genes is a strong indication that many, if not most, of them are false positives. It therefore appears that some important features of coding regions have not yet been taken into account in existing gene prediction programs. Statistical base compositional properties of curated gene data sets from vertebrates, which we briefly review here, should therefore provide a useful benchmark for fine-tuning probabilistic gene models and model parameters that are currently in use.  相似文献   

15.
Porifera (sponges) represent the most ancient, extant metazoan phylum. They existed already prior to the 'Cambrian Explosion'. Based on the analysis of aa sequences of informative proteins, it is highly likely that all metazoan phyla evolved from only one common ancestor (monophyletic origin). As 'autapomorphic' proteins which are restricted to Metazoa only, integrin receptors, receptors with scavenger receptor cysteine-rich repeats, neuronal-like receptors and protein-tyrosine kinases (PTKs) have been identified in Porifera. From the marine sponge Geodia cydonium, a receptor tyrosine kinase (RTK) has been cloned that comprises the characteristic structural topology known from other metazoan RTKs; an extracellular domain, the transmembrane region, the juxtamembrane region and the TK domain. Only two introns, within the coding region of the RTK gene, could be found, which separate the two highly polymorphic immunoglobulin-like domains, found in the extracellular region of the enzyme. The functional role of this sponge RTK could be demonstrated both in situ (grafting experiments) and in vitro (increase of intracellular Ca2+ level). Upstream of this RTK gene, two further genes coding for tyrosine kinases (TK) have been identified. Both are intron-free. The deduced aa sequence of the first gene shows no transmembrane segment; from the second gene--so far--only half of its catalytic domain is known. A phylogenetic analysis with the TK domains from these sequences and a fourth, from a novel scavenger RTK (all domains comprise the signature for the TK class II receptors), showed that they are distantly related to the insulin and insulin-like receptors. The presented findings support the 'introns-late' hypothesis for such genes that encode 'metazoan' proteins. It is proposed that the TKs evolved from protein-serine/threonine kinases through modularization and subsequent exon shuffling. After formation of the ancestral TKs, the modules lost the framing introns to protect the evolutionary novelty. Since cell culture systems of sponges are now available, it can be expected that soon also those mechanisms that control the developmental programs will be unravelled.  相似文献   

16.
Recombinant plasmids were made containing cDNAs synthesized on hamster mRNAs coding for cytoskeletal (beta- or gamma-) actins and for vimentin. Hybridization of the actin probe on restriction digests of one avian and five mammalian DNAs yielded multiple bands; the vimentin probe revealed only one band (accompanied by 2-3 faint bands in some DNAs). The results obtained with the vimentin probe indicate that the corresponding coding sequences: (a) are highly conserved in warm-blooded vertebrates like the actin sequences; (b) have strongly diverged from those coding for other intermediate filament proteins, since hybridization of the vimentin probe does not lead to a diagnostic multiband pattern; and (c) most likely contribute to single gene, in contrast to the sequences coding for other cytoskeletal proteins. Hybridization of the probes on mRNAs from the different sources used showed that the non-coding sequences of both vimentin and actin genes are conserved in length.  相似文献   

17.
Bordetella pertussis, a gram-negative beta-proteobacterium, is the agent of whooping cough in humans. Whooping cough remains a public health problem worldwide, despite well-implemented infant/child vaccination programs. It continues to be endemic and is observed cyclically in vaccinated populations. Classical molecular subtyping methods indicate that genome diversity among B. pertussis isolates is limited. Although the whole bacterial genome has been studied by pulsed-field gel electrophoresis, the genes implicated in the diversity have not been identified. We developed a B. pertussis whole-genome DNA microarray representing over 91% of the predicted coding sequences of the sequenced strain Tohama I. Genomic DNA from clinical isolates with various pulsed-field gel electrophoresis profile patterns was competitively hybridized with the DNA microarray and coding sequences were classified as present, absent or duplicated. Our data strongly suggest that the B. pertussis population is dynamic. In France, with highly vaccinated population, the genetic diversity is low and decreasing with time, and clonal expansion correlates with cycles of the disease. This decrease in diversity is essentially due to loss of genes and pseudogenes. The genes deleted are most of the time flanked by insertion sequences.  相似文献   

18.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

19.
The large subunit of eukaryotic ribosomes contains acidic phosphoproteins which are related to L7/L12 from Escherichia coli. In the brine shrimp Artemia these proteins are designated eL12 and eL12'. We have isolated cDNA clones for these proteins from a cDNA bank that was constructed by the use of size-fractionated poly(A)-rich RNA (8-10S fraction) from Artemia and a synthetic oligonucleotide as primer. Clones containing DNA sequences coding for eL12 and eL12 were characterized by hybrid-selected translation and DNA sequencing. The proteins eL12 and eL12' share an identical peptide of 22 amino acids at their carboxy termini whereas the remaining part of the protein shows little sequence homology. The nucleotide sequences show a different codon use for the amino acids in the common carboxy terminus, thereby excluding a common exon coding for this part of both proteins. Despite the differences in amino acid sequence in the major part of eL12 and eL12' the proteins have a considerable degree of homology on the basis of the distribution of hydrophobic and hydrophilic amino acids over the polypeptide chains, in agreement with a related folding and function of both proteins. Relative levels of mRNA coding for eL12, eL12' and elongation factor 1 alpha were determined during the development of Artemia from a dormant cyst to a nauplius. The data show a coordinate expression of the genes for EF-1 alpha and both ribosomal proteins, excluding a differential expression of the genes for these related ribosomal proteins during embryogenesis. Analysis of the gene copy number for eL12 and eL12' indicates the presence of a few genes for each protein.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号