首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have successfully established a novel protein microarray-based kinase assay, which we applied to identify target proteins of the barley protein kinase CK2alpha. As a source of recombinant barley proteins we cloned cDNAs specific for filial tissues of developing barley seeds into an E. coli expression vector. By using robot technology, 21,500 library clones were arrayed in microtiter plates and gridded onto high-density filters. Protein expressing clones were detected using an anti-RGS-His6 antibody and rearrayed into a sublibrary of 4100 clones. All of these clones were sequenced from the 5'-end and the sequences were analysed by homology searches against protein databases. Based on these results we selected 768 clones expressing different barley proteins for protein purification. The purified proteins were robotically arrayed onto FAST slides. The generated protein microarrays were incubated with an expression library-derived barley CK2alpha in the presence of [gamma-33P]ATP, and signals were detected by X-ray film or phosphor imager. We were able to demonstrate the power of the protein microarray technology by identification of 21 potential targets out of 768 proteins including such well-known substrates of CK2alpha as high mobility group proteins and calreticulin.  相似文献   

2.
The public EST (expressed sequence tag) databases represent an enormous but heterogeneous repository of sequences, including many from a broad selection of plant species and a wide range of distinct varieties. The significant redundancy within large EST collections makes them an attractive resource for rapid pre-selection of candidate sequence polymorphisms. Here we present a strategy that allows rapid identification of candidate SNPs in barley (Hordeum vulgare L.) using publicly available EST databases. Analysis of 271,630 EST sequences from different cDNA libraries, representing 23 different barley varieties, resulted in the generation of 56,302 tentative consensus sequences. In all, 8171 of these unigene sequences are members of clusters with six or more ESTs. By applying a novel SNP detection algorithm (SNiPpER) to these sequences, we identified 3069 candidate inter-varietal SNPs. In order to verify these candidate SNPs, we selected a small subset of 63 present in 36 ESTs. Of the 63 SNPs selected, we were able to validate 54 (86%) using a direct sequencing approach. For further verification, 28 ESTs were mapped to distinct loci within the barley genome. The polymorphism information content (PIC) and nucleotide diversity () values of the SNPs identified by the SNiPpER algorithm are significantly higher than those that were obtained by random sequencing. This demonstrates the efficiency of our strategy for SNP identification and the cost-efficient development of EST-based SNP-markers.The first two authors contributed equally to this work  相似文献   

3.
4.
The goal of this study is to understand the evolution relationship of the members of B-hordein gene family in hull-less barley by analysis of their structure and to explore their utility in grain quality improvement. Six copies of B-hordein gene (Hn1-Hn3, Hn7-Hn9) were cloned from six hull-less barley cultivars collected from Qinghai-Tibet Plateau and molecularly characterized. Comparison of their predicted polypeptide sequences with the published suggested that they all share the same basic protein structures. In addition, we found that the C-terminal end sequences of all B-hordeins shared a similar feature. In the six clones and the other three published (Hn4, Hn5 and Hn6) from hull-less barley, Hn2 and Hn7 contained identical C-terminal end sequence DIMPVDFWH, Hn3, Hn4, Hn5, Hn8 and Hn9 also shared the common sequence DIMPPDFWH, which was similar to that of a B-hordein reported previously. Both Hnl and Hn6 exhibited differences in their C-terminal end sequences, and they clustered into different subgroups. The B-hordeins with identical C-terminal end sequences were clustered into a same subgroup, so we believe that B-hordein gene subfamilies possibly can be classified on the basis of the conserved C-terminal end sequences of predicted polypeptide. Phylogenetic analysis also indicated that there is a relatively weak identity between our predicted B-hordeins and those reported from H. chilense and H. brevisubulatum. All of our nine predicted B-hordeins were clustered together and other B-hordeins formed another cluster. The possible use of these genes in relation to the barley quality is discussed.  相似文献   

5.
The transferrin family is a group of proteins, defined by conserved amino acid motifs and putative function, found in both vertebrates and invertebrates. Included in this group are molecules known to bind iron, including serum transferrin, ovotransferrin, lactotransferrin, and melanotransferrin (MTF). Additional members of this family include inhibitor of carbonic anhydrase (ICA; mammals), major yolk protein (sea urchins), saxiphilin (frog), pacifastin (crayfish), and TTF-1 (algae). Most family members contain two lobes (N and C) of around 340 amino acids, the result of an ancient duplication event. In this article, we review the known functions of these proteins and speculate as to when the different homologs arose. From multiple-sequence alignments and neighbor-joining trees using 71 transferrin family sequences from 51 different species, including several novel sequences found in the Takifugu and Ciona genome databases, we conclude that melanotransferrins are much older (>670 MY) and more pervasive than previously thought, and the serum transferrin/melanotransferrin split may have occurred not long after lobe duplication. All subsequent duplication events diverged from the serum transferrin gene. The creation of such a large multiple-sequence alignment provides important information and could, in the future, highlight the role of specific residues in protein function.  相似文献   

6.
The cadherin superfamily is a large protein family with diverse structures and functions. Because of this diversity and the growing biological interest in cell adhesion and signaling processes, in which many members of the cadherin superfamily play a crucial role, it is becoming increasingly important to develop tools to manage, distribute and analyze sequences in this protein family. Current profile and motif databases classify protein sequences into a broad spectrum of protein superfamilies, however to provide a more specific functional annotation, the next step should include classification of subfamilies of these protein superfamilies. Here, we present a tool that classified greater than 90% of the proteins belonging to the cadherin superfamily found in the SWISS PROT database. Therefore, for most members of the cadherin superfamily, this tool can assist in adding more specific functional annotations than can be achieved with current profile and motif databases. Finally, the classification tool and the results of our analysis were integrated into a web-accessible database (http://calcium.uhnres. utoronto.ca/cadherin).  相似文献   

7.
8.
The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web.  相似文献   

9.
Choo KB  Hsu MC  Chong KY  Huang CJ 《Gene》2007,387(1-2):141-149
Based on bioinformatics analysis, we previously hypothesized the existence of a bipartite TDPOZ protein family members of which carry the TRAF domain (TD) and POZ/BTB [Huang, C.-J., Chen, C.-Y., Chen, H.-H., Tsai, S.-F., Choo, K.-B., 2004. TDPOZ, a family of bipartite animal and plant proteins that contain the TRAF (TD) and POZ/BTB domains. Gene 324, 117-127.]. Conservation in animals and plants suggests important biological functions for the putative TDPOZ proteins. In this work, we report testis-specific expression of two new Tdpoz members, Rtdpoz-T1 and -T2, of the rat genome; the result clearly indicates that members of the hypothetical gene family are, indeed, expressed. T1 and T2 cDNA sequences were derived by rapid amplification of cDNA ends (RACE). The exons of the genes were determined by queries of the rat genome sequence draft and selectively confirmed in splicing assays. The results indicate that T1 and T2 share a common leader exon indicative of alternative splicing, and that the genes are uninterrupted by introns in their respective coding sequences. Database interrogations also reveal a combined 297 hits of Rtdpoz-like sequences on 7 chromosomes; however, the bulk of the hits (264) and 26 putative TDPOZ-encoding genes, including T1 and T2, are found in a approximately 2.5 Mb cluster in the Rn2_2148 supercontig on chromosome 2. Our data signify retrotransposition in the generation and expansion of the Rtdpoz repertoire in the rat genome. We also anticipate spatio-temporal-specific expression of many more TDPOZ members in the rat or other animals and plants.  相似文献   

10.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

11.
12.
Yona G  Linial N  Linial M 《Proteins》1999,37(3):360-378
We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.-all comparison of SWISSPROT, gives a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins; however, transitivity is applied restrictively in an attempt to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Different indices of validity were applied to assess the quality of our classification and compare it with the protein families in the PROSITE and Pfam databases. Our classification agrees with these domain-based classifications for between 64.8% and 88.5% of the proteins. It also finds many new clusters of protein sequences which were not classified by these databases. The hierarchical organization suggested by our analysis reveals finer subfamilies in families of known proteins as well as many novel relations between protein families.  相似文献   

13.
The majority of verified plant disease resistance genes isolated to date are of the NBS-LRR class, encoding proteins with a predicted nucleotide binding site (NBS) and a leucine-rich repeat (LRR) region. We took advantage of the sequence conservation in the NBS motif to clone, by PCR, gene fragments from barley representing putative disease resistance genes of this class. Over 30 different resistance gene analogs (RGAs) were isolated from the barley cultivar Regatta. These were grouped into 13 classes based on DNA sequence similarity. Actively transcribed genes were identified from all classes but one, and cDNA clones were isolated to derive the complete NBS-LRR protein sequences. Some of the NBS-LRR genes exhibited variation with respect to whether and where particular introns were spliced, as well as frequent premature polyadenylation. DNA sequences related to the majority of the barley RGAs were identified in the recently expanded public rice genomic sequence database, indicating that the rice sequence can be used to extract a large proportion of the RGAs from barley and other cereals. Using a combination of RFLP and PCR marker techniques, representatives of all barley RGA gene classes were mapped in the barley genome, to all chromosomes except 4H. A number of the RGA loci map in the vicinity of known disease resistance loci, and the association between RGA S-120 and the nematode resistance locus Ha2 on chromosome 2H was further tested by co-segregation analysis. Most of the RGA sequences reported here have not been described previously, and represent a useful resource as candidates or molecular markers for disease resistance genes in barley and other cereals.  相似文献   

14.
In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22 000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir.  相似文献   

15.
16.
Various sequence-motif and sequence-cluster databases have been integrated into a new resource known as InterPro. Because the contributing databases have different clustering principles and scoring sensitivities, the combined assignments complement each other for grouping protein families and delineating domains. InterPro and new developments in the analysis of both the phylogenetic profiles of protein families and domain fusion events improve the prediction of specific functions for numerous proteins.  相似文献   

17.
Many characterized plant disease resistance genes encode proteins which have conserved motifs such as the nucleotide binding site. Conservation extends across different species, therefore resistance genes from one species can be used to isolate homologous regions from another by employing DNA sequences encoding conserved protein motifs as probes. Here we report the isolation and characterization of a barley (Hordeum vulgare L.) resistance gene analog family consisting of nine members homologous to the maize rust resistance gene Rp1-D. Five barley Rp1-D homologues are clustered within approximately 400 kb on chromosome 1(7H), near, but not co-segregating with, the barley stem rust resistance gene Rpg1; while others are localized on chromosomes 3(3H), 5(1H), 6(6H) and 7(5H). Analyses of predicted amino-acid sequences of the barley Rp1-D homologues and comparison with known plant disease resistance genes are presented.  相似文献   

18.
Kinesin superfamily proteins (KIFs) are key players or 'hub' proteins in the intracellular transport system, which is essential for cellular function and morphology. The KIF superfamily is also the first large protein family in mammals whose constituents have been completely identified and confirmed both in silico and in vivo. Numerous studies have revealed the structures and functions of individual family members; however, the relationships between members or a perspective of the whole superfamily structure until recently remained elusive. Here, we present a comprehensive summary based on a large, systematic phylogenetic analysis of the kinesin superfamily. All available sequences in public databases, including genomic information from all model organisms, were analyzed to yield the most complete phylogenetic kinesin tree thus far, comprising 14 families. This comprehensive classification builds on the recently proposed standardized nomenclature for kinesins and allows systematic analysis of the structural and functional relationships within the kinesin superfamily.  相似文献   

19.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

20.
In this study, we introduce a novel bioinformatics program, Spore-associated Symbiotic Microbes Position-specific Function (SeSaMe PS Function), for position-specific functional analysis of short sequences derived from metagenome sequencing data of the arbuscular mycorrhizal fungi. The unique advantage of the program lies in databases created based on genus-specific sequence properties derived from protein secondary structure, namely amino acid usages, codon usages, and codon contexts of 3-codon DNA 9-mers. SeSaMe PS Function searches a query sequence against reference sequence database, identifies 3-codon DNA 9-mers with structural roles, and creates a comparative dataset containing the codon usage biases of the 3-codon DNA 9-mers from 54 bacterial and fungal genera. The program applies correlation principal component analysis in conjunction with K-means clustering method to the comparative dataset. 3-codon DNA 9-mers clustered as a sole member or with only a few members are often structurally and functionally distinctive sites that provide useful insights into important molecular interactions. The program provides a versatile means for studying functions of short sequences from metagenome sequencing and has a wide spectrum of applications. SeSaMe PS Function is freely accessible at www.fungalsesame.org.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号