首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. As in previous editions the genetic names are consistently associated to each sequence with a known and confirmed ORF. If necessary, synonyms are given in the case of allelic duplicated sequences. Although the first publication of a sequence gives-according to our rules-the genetic name of a gene, in some instances more commonly used names are given to avoid nomenclature problems and the use of ancient designations which are no longer used. In these cases the old designation is given as synonym. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, SWISSPROT and EMBL accession numbers. New entries will also contain the name from the systematic sequencing efforts. Since the release of LISTA4.1 we update the database continuously. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. This release includes reports from full Smith and Watermann peptide-level searches against a non-redundant protein sequence database. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). The database is available by FTP and on World Wide Web.  相似文献   

3.
Virus-like particles generated by the heterologous expression of virus structural proteins are able to potentiate the immunogenicity of foreign epitopes presented on their surface. In recent years epitopes of various origin have been inserted into the core antigen of hepatitis B virus (HBV) allowing the formation of chimaeric HBV core particles. Chimaeric core particles carrying the 45 N-terminal amino acids of the Puumala hantavirus nucleocapsid protein induced protective immunity in bank voles, the natural host of this hantavirus. Particles applied in the absence of adjuvant are still immunogenic and partially protective in bank voles. Although a C-terminally truncated core antigen of HBV (HBcAg delta) tolerates the insertion of extended foreign sequences, for the construction of multivalent vaccines the limited insertion capacity is still a critical factor. Recently, we have described a new system for generating HBV 'mosaic particles' in an Escherichia coli suppressor strain based on a readthrough mechanism on a stop linker located in front of the insert. Those mosaic particles are built up by both HBcAg delta and the HBcAg delta/Puumala nucleocapsid readthrough protein. The particles formed presented the 114 amino acid (aa) long hantavirus sequence, at least in part, on their surface and induced antibodies against the hantavirus sequence in bank voles. Variants of the stop linker still allowed the formation of mosaic particles demonstrating that stop codon suppression alone is sufficient for the packaging of longer foreign sequences in mosaic particles. Another approach to increase the insertion capacity is based on the simultaneous insertion of different Puumala nucleocapsid protein sequences (aa 1-45 and aa 75-119) into two different positions (aa 78 and behind aa 144) of a single HBcAg molecule. The data presented are of high relevance for the generation of multivalent vaccines requiring a high insertion capacity for foreign sequences.  相似文献   

4.
This article is in the area of protein sequence investigation. It studies protein sequence periodicity. The notion of latent periodicity is introduced. A mathematical method for searching for latent periodicity in protein sequences is developed. Implementation of the method developed for known cases of perfect and imperfect periodicity is demonstrated. Latent periodicity of many protein sequences from the SWISS-PROT data bank is revealed by the method and examples of latent periodicity of amino acid sequences are demonstrated for: the translation initiation factor EIF-2B (epsilon subunit) of Saccharomyces cerevisiae from the E2BE_YEAST sequence; the E.coli ferrienterochelin receptor from the FEPA_ECOLI sequence; the lysozyme of Bacteriophage SF6 from the LY_BPSF6 sequence; lipoamide dehydrogenase of Azotobacter vinelandii from the DLDH_AZOVI sequence. These protein sequences have latent periods equal to six, two, seven and 19 amino acids, respectively. We propose that a possible purpose of the amino acid sequence latent periodicity is to determine certain protein structures.  相似文献   

5.
We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS).  相似文献   

6.
Silva PJ 《Proteins》2008,70(4):1588-1594
Hydrophobic cluster analysis (HCA) has long been used as a tool to detect distant homologies between protein sequences, and to classify them into different folds. However, it relies on expert human intervention, and is sensitive to subjective interpretations of pattern similarities. In this study, we describe a novel algorithm to assess the similarity of hydrophobic amino acid distributions between two sequences. Our algorithm correctly identifies as misattributions several HCA-based proposals of structural similarity between unrelated proteins present in the literature. We have also used this method to identify the proper fold of a large variety of sequences, and to automatically select the most appropriate structure for homology modeling of several proteins with low sequence identity to any other member of the protein data bank. Automatic modeling of the target proteins based on these templates yielded structures with TM-scores (vs. experimental structures) above 0.60, even without further refinement. Besides enabling a reliable identification of the correct fold of an unknown sequence and the choice of suitable templates, our algorithm also shows that whereas most structural classes of proteins are very homogeneous in hydrophobic cluster composition, a tenth of the described families are compatible with a large variety of hydrophobic patterns. We have built a browsable database of every major representative hydrophobic cluster pattern present in each structural class of proteins, freely available at http://www2.ufp.pt/ pedros/HCA_db/index.htm.  相似文献   

7.
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.  相似文献   

8.
9.
Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.  相似文献   

10.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

11.
Aminopeptidase N (APN) isoforms were identified as candidate receptors for Bacillus thuringiensis Cry toxins from the midgut of several insect species. In this study a partial cDNA encoding aminopeptidase (slfbAPN) was cloned from fat body of the moth Spodoptera litura. In the deduced amino acid sequence the characteristic metallopeptidase sequences, HEXXHX18E and GAMENWG were conserved but the sequence showed only 33–39% identity to other insect APNs, which were also reported to be Cry toxin receptors. The presence of a putative GPI anchor signal sequence at the C-terminus indicated that it is a membrane-anchored protein. The slfbAPN expression was restricted to the fat body as suggested by northern blot analysis of different tissues. Biochemical analyses including immunoblotting, ligand blotting and lectin blotting, demonstrated that slfbAPN is a membrane-anchored glycoprotein in the fat body and it binds to Cry toxins. The nucleotide sequence shown here has been submitted to the GenBank sequence data bank and is available under accession number EF372603.  相似文献   

12.
The Protein Identification Resource (PIR) protein sequence data bank was searched for sequence similarity between known proteins and human DNA polymerase beta (Pol beta) or human terminal deoxynucleotidyltransferase (TdT). Pol beta and TdT were found to exhibit amino acid sequence similarity only with each other and not with any other of the 4750 entries in release 12.0 of the PIR data bank. Optimal amino acid sequence alignment of the entire 39-kDa Pol beta polypeptide with the C-terminal two thirds of TdT revealed 24% identical aa residues and 21% conservative aa substitutions. The Monte Carlo score of 12.6 for the entire aligned sequences indicates highly significant aa sequence homology. The hydropathicity profiles of the aligned aa sequences were remarkably similar throughout, suggesting structural similarity of the polypeptides. The most significant regions of homology are aa residues 39-224 and 311-333 of Pol beta vs. aa residues 191-374 and 484-506 of TdT. In addition, weaker homology was seen between a large portion of the 'nonessential' N-terminal end of TdT (aa residues 33-130) and the first region of strong homology between the two proteins (aa residues 31-128 of Pol beta and aa residues 183-280 of TdT), suggestive of genetic duplication within the ancestral gene. On the basis of nucleotide differences between conserved regions of Pol beta and TdT genes (aligned according to optimally aligned aa sequences) it was estimated that Pol beta and TdT diverged on the order of 250 million years ago, corresponding roughly to a time before radiation of mammals and birds.  相似文献   

13.
赵南星  韩其晟  黄建 《生态学杂志》2017,28(12):3855-3861
为更好地恢复和保存白皮松天然林,在陕西省白皮松残存林地采集根际土壤,采用幼苗检测法获取土壤外生菌根真菌繁殖体,用形态观察分类与ITS-PCR-sequencing相结合的方法进行菌根鉴定,研究白皮松林地外生菌根真菌土壤繁殖体库的组成.结果表明: 在白皮松幼苗菌根中共获得73个特异性序列;按照97%的相似度阈值,将序列划分为12个可操作分类单元(OTUs);稀疏曲线分析表明,本研究基本获得了白皮松土壤外生菌根繁殖体库的多样性.常见种有土生空团菌、Tomentella sp.、Tuber sp.等.出现频率最高的一个OTU(80%)与已知种类相似度较低(75%),说明其可能是一个新的菌根菌种类.白皮松残存天然林地的外生菌根繁殖体库中具有其他松科植物土壤繁殖体库常见的种类,但是频率最高的种类未能鉴定到已知属或科,说明白皮松菌根繁殖体库具有其宿主特异性.这种群落特异构成也说明研究和利用白皮松土壤外生菌根真菌繁殖体库具有特殊性和重要性.  相似文献   

14.
The gene for Escherichia coli leucyl-tRNA synthetase leuS has been cloned by complementation of a leuS temperature sensitive mutant KL231 with an E.coli gene bank DNA. The resulting clones overexpress leucyl-tRNA synthetase (LeuRS) by a factor greater than 50. The DNA sequence of the complete coding regions was determined. The derived N-terminal protein sequence of LeuRS was confirmed by independent protein sequencing of the first 8 aminoacids. Sequence comparison of the LeuRS sequence with all aminoacyl-tRNA synthetase sequences available reveal a significant homology with the valyl-, isoleucyl- and methionyl-enzyme indicating that the genes of these enzymes could have derived from a common ancestor. Sequence comparison with the gene product of the yeast nuclear NAM2-1 suppressor allele curing mitochondrial RNA maturation deficiency reveals about 30% homology.  相似文献   

15.
A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.  相似文献   

16.
Chu CK  Feng LL  Wouters MA 《Proteins》2005,60(4):577-583
Structural data mining studies attempt to deduce general principles of protein structure from solved structures deposited in the protein data bank (PDB). The entire database is unsuitable for such studies because it is not representative of the ensemble of protein folds. Given that novel folds continue to be unearthed, some folds are currently unrepresented in the PDB while other folds are overrepresented. Overrepresentation can easily be avoided by filtering the dataset. PDB_SELECT is a well-used representative subset of the PDB that has been deduced by sequence comparison. Specifically, structures with sequences that exhibit a pairwise sequence identity above a threshold value are weeded from the dataset. Although length criteria for pairwise alignments have a structural basis, this automated method of pruning is essentially sequence-based and runs into problems in the twilight zone, possibly resulting in some folds being overrepresented. The value-added structure databases SCOP and CATH are also a potential source of a nonredundant dataset. Here we compare the sequence-derived dataset PDB_SELECT with the structural databases SCOP (Structural Classification Of Proteins) and CATH (Class-Architecture-Topology-Homology). We show that some folds remain overrepresented in the PDB_SELECT dataset while other folds are not represented at all. However, SCOP and CATH also have their own problems such as the labor-intensiveness of the update process and the problem of determining whether all folds are equally or sufficiently distant. We discuss areas where further work is required.  相似文献   

17.
A complete set of 6300 small molecule ligands was extracted from the protein data bank, and deposited online in PubChem as data source 'SMID'. This set's major improvement over prior methods is the inclusion of cyclic polypeptides and branched polysaccharides, including an unambiguous nomenclature, in addition to normal monomeric ligands. Only the best available example of each ligand structure is retained, and an additional dataset is maintained containing co-ordinates for all examples of each structure. Attempts are made to correct ambiguous atomic elements and other common errors, and a perception algorithm was used to determine bond order and aromaticity when no other information was available.  相似文献   

18.
When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a nonarbitrary threshold value for excluding redundant sequences. The impact of the choice of scoring matrix used in the alignments is examined. We demonstrate that the parameter determining the quality of the correlation is the relative entropy of the matrix, rather than the assumed (PAM or identity) substitution model. Results are presented for the case of prediction of cleavage sites in signal peptides. By inspection of the false positives, several errors in the database were found. The procedure presented may be used as a general outline for finding a problem-specific similarity measure and threshold value for analysis of other functional amino acid or nucleotide sequence patterns.  相似文献   

19.
Reversible protein phosphorylation appears to be important at several stages in the signal transduction pathways in Dictyostelium discoideum. To elucidate its role, we have isolated sequences encoding putative protein kinases and phosphoprotein phosphatases by homology cloning using polymerase chain reactions (PCRs). Oligonucleotide primers were synthesized for use as forward and reverse primers with their nucleotide sequences deduced from the amino acid sequences of conserved domains of several protein kinases and phosphoprotein phosphatases. The fragments amplified by PCR were cloned, sequenced, and shown to encode parts of five different protein kinases and two phosphoprotein phosphatases. Several features such as the deduced amino acid sequence homology, location of invariant amino acids, GC content, and the codon usage confirmed that one set of clones encode parts of different protein kinases of Dictyostelium. Two clones derived from phosphoprotein phosphatase primers encode fragments of type 1 and type 2A phosphoprotein phosphatases. Amplified fragments were used to screen a lambda gt11 bank, and several cDNA clones for protein kinases were isolated. Some of these show differential expression during development or in response to exogenous cAMP.  相似文献   

20.
MOTIVATION: When analysing novel protein sequences, it is now essential to extend search strategies to include a range of 'secondary' databases. Pattern databases have become vital tools for identifying distant relationships in sequences, and hence for predicting protein function and structure. The main drawback of such methods is the relatively small representation of proteins in trial samples at the time of their construction. Therefore, a negative result of an amino acid sequence comparison with such a databank forces a researcher to search for similarities in the original protein banks. We developed a database of patterns constructed for groups of related proteins with maximum representation of amino acid sequences of SWISS-PROT in the groups. RESULTS: Software tools and a new method have been designed to construct patterns of protein families. By using such method, a new version of databank of protein family patterns, PROF_ PAT 1.3, is produced. This bank is based on SWISS-PROT (r1.38) and TrEMBL (r1.11), and contains patterns of more than 13 000 groups of related proteins in a format similar to that of the PROSITE. Motifs of patterns, which had the minimum level of probability to be found in random sequences, were selected. Flexible fast search program accompanies the bank. The researcher can specify a similarity matrix (the type PAM, BLOSUM and other). Variable levels of similarity can be set (permitting search strategies ranging from exact matches to increasing levels of 'fuzziness'). AVAILABILITY: The Internet address for comparing sequences with the bank is: http://wwwmgs.bionet.nsc.ru/mgs/programs/prof_pat/. The local version of the bank and search programs (approximately 50 Mb) is available via ftp: ftp://ftp.bionet.nsc. ru/pub/biology/vector/prof_pat/, and ftp://ftp.ebi.ac. uk/pub/databases/prof_pat/. Another appropriate way for its external use is to mail amino acid sequences to bachin@vector.nsc.ru for comparison with PROF_ PAT 1.3.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号