首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The paper focuses on the development of a software tool for protein clustering according to their amino acid content. All known human proteins were clustered according to the relative frequencies of their amino acids starting from the UniProtKB/Swiss-Prot reference database and making use of hierarchical cluster analysis. Results were compared to those based on sequence similarities. Results: Proteins display different clustering patterns according to type. Many extracellular proteins with highly specific and repetitive sequences (keratins, collagens etc.) cluster clearly confirming the accuracy of the clustering method. In our case clustering by sequence and amino acid content overlaps. Proteins with a more complex structure with multiple domains (catalytic, extracellular, transmembrane etc.), even if classified very similar according to sequence similarity and function (aquaporins, cadherins, steroid 5-alpha reductase etc.) showed different clustering according to amino acid content. Availability of essential amino acids according to local conditions (starvation, low or high oxygen, cell cycle phase etc.) may be a limiting factor in protein synthesis, whatever the mRNA level. This type of protein clustering may therefore prove a valuable tool in identifying so far unknown metabolic connections and constraints.  相似文献   

2.
The 3' regions of the gene encoding the cap binding protein eIF4E were successfully isolated from Agaricus bisporus and Verticillium fungicola using a degenerate primer within the eIF4E gene and an anchored oligo d(T) primer. The deduced amino acid sequences contained 173 residues for A. bisporus and 171 residues V. fungicola. Analysis of these sequences shows that despite conserved regions of homology, centering around tryptophan residues, A. bisporus and V. fungicola are very diverse at the amino acid and DNA level. Percentage homology between the two fungi is low at the nucleotide, 35%, and amino acid level, 29%. The highest degree of similarity between the A. bisporus sequence and other published sequences is with the Homo sapiens eIF4E sequence (32%). V. fungicola exhibited highest homology with the eIF4E sequence from Caenorhabditis elegans (34%). Southern analysis of genomic DNA indicated a single copy of the gene within the A. bisporus genome.  相似文献   

3.
Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids’ physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.  相似文献   

4.
D S Dwyer 《Life sciences》1989,45(5):421-429
Mice were immunized with alpha-bungarotoxin (BGT), a nearly irreversible antagonist of the acetylcholine receptor (AChR), to produce monoclonal antibodies (Mabs). One of the Mabs (JMC2.7) bound not only to BGT, but to the AChR as well. To understand the molecular basis for this novel cross-reaction, the amino acid sequences of these proteins were searched for areas of similarity which might constitute the shared epitope. A number of short segments of sequence homology were found, one of them representing the BGT-binding site of the AChR. Because a portion of BGT resembles that part of the AChR that binds toxin, the self-binding of BGT was evaluated. As shown here, BGT binds specifically to itself to form dimers. In order to extend these observations, other ligand-receptor pairs were examined for sequence homology. The sodium channel and alpha-scorpion toxins were found to have distinct areas of similarity, as do interleukin 2 (IL-2) and the IL-2 receptor. As a general principle, we propose that peptide ligands and their receptors may often share amino acid sequence homology. In fact, the sites of interaction between two proteins may largely be determined by these regions of similarity.  相似文献   

5.
6.
We have identified and cloned a new member of the mammalian tandem pore domain K+ channel subunit family, TWIK-originated similarity sequence, from a human testis cDNA library. The 939 bp open reading frame encodes a 313 amino acid polypeptide with a calculated Mr of 33.7 kDa. Despite the same predicted topology, there is a relatively low sequence homology between TWIK-originated similarity sequence and other members of the mammalian tandem pore domain K+ channel subunit family group. TWIK-originated similarity sequence shares a low (< 30%) identity with the other mammalian tandem pore domain K+ channel subunit family group members and the highest identity (34%) with TWIK-1 at the amino acid level. Similar low levels of sequence homology exist between all members of the mammalian tandem pore domain K+ channel subunit family. Potential glycosylation and consensus PKC sites are present. Northern analysis revealed species and tissue-specific expression patterns. Expression of TWIK-originated similarity sequence is restricted to human pancreas, placenta and heart, while in the mouse, TWIK-originated similarity sequence is expressed in the liver. No functional currents were observed in Xenopus laevis oocytes or HEK293T cells, suggesting that TWIK-originated similarity sequence may be targeted to locations other than the plasma membrane or that TWIK-originated similarity sequence may represent a novel regulatory mammalian tandem pore domain K+ channel subunit family subunit.  相似文献   

7.
8.
A rat spleen cDNA library was screened for clones carrying the cDNAs for prothymosin alpha and parathymosin. Sequence analysis of a clone carrying the entire coding region for prothymosin alpha confirmed and completed the amino acid sequence for this polypeptide and established the number of amino acid residues as 111. Rat prothymosin alpha differs from human prothymosin alpha at six positions, including four substitutions and two insertions. The nucleotide sequences of the cDNAs for the rat and human polypeptides are more than 90% identical in the open reading frames, with significant homology extending into the 5' and 3' flanking regions. From the same library, we also isolated a clone carrying 80% of the coding region for rat parathymosin. The number of amino acid residues in rat parathymosin is 101, based on the sequence deduced from the cDNA insert and earlier information on the sequence in the amino-terminal portion of this polypeptide. Despite their similarity in size and amino acid composition, rat prothymosin alpha and rat parathymosin show only limited sequence homology, primarily in the segment including residues 14 through 25, where 10 of 12 positions are identical in the two polypeptides. this is also the region of significant sequence similarity to a 12-amino-acid segment in the p17 protein of the human immunodeficiency disease associated virus (HTLV-IIIB).  相似文献   

9.
The three‐dimensional structures of a set of ‘never born proteins’ (NBP, random amino acid sequence proteins with no significant homology with known proteins) were predicted using two methods: Rosetta and the one based on the ‘fuzzy‐oil‐drop’ (FOD) model. More than 3000 different random amino acid sequences have been generated, filtered against the non redundant protein sequence data base, to remove sequences with significant homology with known proteins, and subjected to three‐dimensional structure prediction. Comparison between Rosetta and FOD predictions allowed to select the ten top (highest structural similarity) and the ten bottom (the lowest structural similarity) structures from the ranking list organized according to the RMS‐D value. The selected structures were taken for detailed analysis to define the scale of structural accordance and discrepancy between the two methods. The structural similarity measurements revealed discrepancies between structures generated on the basis of the two methods. Their potential biological function appeared to be quite different as well. The ten bottom structures appeared to be ‘unfoldable’ for the FOD model. Some aspects of the general characteristics of the NBPs are also discussed. The calculations were performed on the EUChinaGRID grid platform to test the performance of this infrastructure for massive protein structure predictions.  相似文献   

10.
The complete amino acid sequence of human retinal S-antigen (48 kDa protein), a retinal protein involved in the visual process has been determined by cDNA sequencing. The largest cDNA was 1590 base pairs (bp) and it contained an entire coding sequence. The similarity of nucleotide sequence between the human and bovine is approximately 80%. The predicted amino acid sequence indicates that human S-antigen has 405 residues and its molecular mass is 45050 Da. The amino acid sequence homology between human and bovine is 81%. There is no overall sequence similarity between S-antigen and other proteins listed in the National Biomedical Research Foundation (NBRF) protein data base. However, local regions of sequence homology with alpha-transducin (T alpha) are apparent including the putative rhodopsin binding and phosphoryl binding sites. In addition, human S-antigen has sequences identical to bovine uveitopathogenic sites, indicating that some types of human uveitis may in part be related to the animal model of experimental autoimmune uveitis (EAU).  相似文献   

11.
The Fourier methods are applied to the pairwise comparison of Calpha-backbones in protein structures. The technique allows to assess both the general similarity and the main origins of resemblance (coincident periodicities, similarity of fragments, or large-scale semblance of folding). The analogous methods can be extended to the study of correlations between the structural characteristics for the Calpha-backbone of one protein and the distribution of physico-chemical parameters along the primary amino acid sequence for the other. Finally, we discuss the problem of clusterization of pairwise data into tree-like hierarchical system.  相似文献   

12.
Summary Adenovirus E1A and c-myc genes are known to be capable of transforming primary rat cells when they occur in combination with either polyoma middle-T or T24 Harvey-ras 1 genes. There was a low level of amino acid sequence homology between the nuclear adenovirus-12 (Ad12) E1A protein product (289 amino acids) and the c-myc protein based on optimal alignment and percentage identity. In contrast to others [Ralston R, Bishop JM (1983) Nature 306:803–806], we concluded that this low level of amino acid sequence homology was not significant, since rabies glycoprotein (RGP), which has no transforming function and localizes to the cell surface, had a similar low level of amino acid sequence homology to the c-myc protein. Furthermore, dot-matrix analysis, when used to test the overall level of amino acid sequence homology, showed no significant homology between c-myc and Ad12 E1A, E1B, or RGP. Thus, low levels of amino acid sequence homology between two proteins may not be sufficient to predict structural and functional similarities between them reliably, even if the two proteins appear to share a common function.  相似文献   

13.
Using several consensus sequences for the 106 amino acid residue alpha-spectrin repeat segment as probes we searched animal sequence databases using the BLAST program in order to find proteins revealing limited, but significant similarity to spectrin. Among many spectrins and proteins from the spectrin-alpha-actinin-dystrophin family as well as sequences showing a rather high degree of similarity in very short stretches, we found seven homologous animal sequences of low overall similarity to spectrin but showing the presence of one or more spectrin-repeat motifs. The homology relationship of these sequences to alpha-spectrin was further analysed using the SEMIHOM program. Depending on the probe, these segments showed the presence of 6 to 26 identical amino acid residues and a variable number of semihomologous residues. Moreover, we found six protein sequences, which contained a sequence fragment sharing the SH3 (sarc homology region 3) domain homology of 42-59% similarity. Our data indicate the occurrence of motifs of significant homology to alpha-spectrin repeat segments among animal proteins, which are not classical members of the spectrin-alpha-actinin-dystrophin family. This might indicate that these segments together with the SH3 domain motif are conserved in proteins which possibly at the early stage of evolution were close cognates of spectrin-alpha-actinin-dystrophin progenitors but then evolved separately.  相似文献   

14.
The full-length cDNA of a phospholipid transfer protein (PLTP) was isolated from Aspergillus oryzae by a RACE-PCR procedure using degenerated primer pool selected from the N-terminal sequence of the purified phosphatidylinositol/phosphatidylglycerol transfer protein (PG/PI-TP). The cDNA encodes a 173 amino acid protein of 18823 Da. The deduced amino acid sequence from position 38 to 67 is 100% identical to the N-terminal sequence (first 30 amino acids) of the purified PG/PI-TP. This amino acid sequence is preceded by a leader peptide of 37 amino acids which is predicted to be composed of a signal peptide of 21 amino acids followed by an extra-sequence of 16 amino acids, or a membrane anchor protein signal (amino acid 5-29). This strongly suggests that the PG/PI-TP is a targeted protein. The deduced mature protein is 138 amino acids long with a predicted molecular mass of 14933 Da. Comparison of the deduced PG/PI-TP sequence with other polypeptide sequences available in databases revealed a homology with a protein deduced from an open reading frame coding for an unknown protein in Saccharomyces cerevisiae (36% identity and 57% similarity). Apart from this homology, the PG/PI-TP is unique and specific to the filamentous fungi on the basis of comparison of PLTP protein sequences. Northern blot analysis of RNA isolated from A. oryzae cultures grown on glucose or glucose supplemented with phospholipids suggests that the PG/PI-TP is transcribed by only one RNA species and allows us to show that expression of the protein is regulated at the messenger RNA level.  相似文献   

15.
We report the identification and nucleotide sequence analysis of pKW1, a plasmid of the psychrotrophic bacterium Pseudoalteromonas sp. 643A isolated from the stomach of Antarctic krill Euphasia superba. pKW1 consists of 4583 bp, has a G+C content of 43% and seven putative open reading frames (ORFs). The deduced amino acid sequence from ORF-1 shared significant similarity with the plasmid replicase protein of Psychrobacter cryohalolentis, strain K5. The DNA region immediately downstream of the ORF-1 showed some homology with the Rep-binding sequence of the theta-replicating ColE2-type plasmids. The ORF-3 amino acid sequence revealed amino acid sequence homology with the mobilization protein of Psychrobacter sp. PRwf-1 and Moraxella catarrhalis, with identities of 28% and 25%, respectively. The ORF-4 showed 46% amino acid sequence homology with the putative relaxase/mobilization nuclease MobA of Hafnia alvei and 44% homology with the putative mobilization protein A of Pasterulla multocida. The copy number of pKW1 in Pseudoalteromonas sp. 643A was estimated of 15 copies per chromosome.  相似文献   

16.

Background  

The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.  相似文献   

17.
The complete amino acid sequence (186 amino acid residues) of a basic cytosolic protein from bovine brain has been determined. It was previously described as a phosphatidylethanolamine binding protein. Computer analyses have been used to calculate its hydropathy profile and to predict its secondary structure. Comparison with other proteins did not detect any significant sequence similarity, except for a short region which presents 53% sequence homology with bovine phosphatidylcholine transfer protein.  相似文献   

18.
A DNA test to sex ratite birds   总被引:4,自引:0,他引:4  
DNA-based sex tests now exist for many avian species. However, none of these tests are widely applicable to ratites. We present DNA sequence data for a locus that is W chromosome-linked in the kiwi, ostrich, cassowary, rhea, and emu. At the amino acid level, this sequence has significant homology to X-linked genes in platyfish and Caenorhabditis elegans. Polymerase chain reaction (PCR) primers designed to this locus allow the assignment of sex in all species of living ratites.  相似文献   

19.
K Inatomi 《DNA research》1998,5(6):365-371
The structural gene, nosZ, for the monomeric N2O reductase has been cloned and sequenced from the denitrifying bacterium Achromobacter cycloclastes. The nosZ gene encodes a protein of 642 amino acid residues and the deduced amino acid sequence showed homology to the previously derived sequences for the dimeric N2O reductases. The relevant DNA region of about 3.6 kbp was also sequenced and found to consist of four genes, nosDFYL based on the similarity with the N2O reduction genes of Pseudomonas stutzeri. The gene product of A. cycloclastes nosF (299 amino acid residues) has a consensus ATP-binding sequence, and the nos Y gene encodes a hydrophobic protein (273 residues) with five transmembrane segments, suggesting the similarity with an ATP-binding cassette (ABC) transporter which has two distinct domains of a highly hydrophobic region and ATP-binding sites. The nosL gene encodes a protein of 193 amino acid residues and the derived sequence showed a consensus sequence of lipoprotein modification/processing site. The expression of nosZ gene in Escherichia coli cells and the comparison of the translated sequences of the nosDFYL genes with those of bacterial transport genes for inorganic ions are discussed.  相似文献   

20.
When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a nonarbitrary threshold value for excluding redundant sequences. The impact of the choice of scoring matrix used in the alignments is examined. We demonstrate that the parameter determining the quality of the correlation is the relative entropy of the matrix, rather than the assumed (PAM or identity) substitution model. Results are presented for the case of prediction of cleavage sites in signal peptides. By inspection of the false positives, several errors in the database were found. The procedure presented may be used as a general outline for finding a problem-specific similarity measure and threshold value for analysis of other functional amino acid or nucleotide sequence patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号