首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Acyltransferases (AT) are enzymes that catalyze the transfer of acyl group to a receptor molecule. This review focuses on ATs that act on thioester-containing substrates. Although many ATs can recognize a wide variety of substrates, sequence similarity analysis allowed us to classify the ATs into fifteen distinct families. Each AT family is originated from enzymes experimentally characterized to have AT activity, classified according to sequence similarity, and confirmed with tertiary structure similarity for families that have crystallized structures available. All the sequences and structures of the AT families described here are present in the thioester-active enzyme (ThYme) database. The AT sequences and structures classified into families and available in the ThYme database could contribute to enlightening the understanding acyl transfer to thioester-containing substrates, most commonly coenzyme A, which occur in multiple metabolic pathways, mostly with fatty acids.  相似文献   

2.
MotifCluster finds related motifs in a set of sequences, and clusters the sequences into families using the motifs they contain. MotifCluster, at , lets users test whether proteins are related, cluster sequences by shared conserved motifs, and visualize motifs mapped onto trees, sequences and three-dimensional structures. We demonstrate MotifCluster's accuracy using gold-standard protein superfamilies; using recommended settings, families were assigned to the correct superfamilies with 0.17% false positive and no false negative assignments.  相似文献   

3.
The explosion of biological data resulting from genomic and proteomic research has created a pressing need for data analysis techniques that work effectively on a large scale. An area of particular interest is the organization and visualization of large families of protein sequences. An increasingly popular approach is to embed the sequences into a low-dimensional Euclidean space in a way that preserves some predefined measure of sequence similarity. This method has been shown to produce maps that exhibit global order and continuity and reveal important evolutionary, structural, and functional relationships between the embedded proteins. However, protein sequences are related by evolutionary pathways that exhibit highly nonlinear geometry, which is invisible to classical embedding procedures such as multidimensional scaling (MDS) and nonlinear mapping (NLM). Here, we describe the use of stochastic proximity embedding (SPE) for producing Euclidean maps that preserve the intrinsic dimensionality and metric structure of the data. SPE extends previous approaches in two important ways: (1) It preserves only local relationships between closely related sequences, thus allowing the map to unfold and reveal its intrinsic dimension, and (2) it scales linearly with the number of sequences and therefore can be applied to very large protein families. The merits of the algorithm are illustrated using examples from the protein kinase and nuclear hormone receptor superfamilies.  相似文献   

4.
Now available nucleotide sequences of neurotransmitter receptor genes enable to apply oligonucleotides targeted to mRNAs of these genes for highly selective inactivation of their expression (antisense-knockdown) and for function determination of single receptor subtype by this experimental approach. The antisense-knockdown may be of special importance in case of receptor families members of which are pharmacologically similar. Advantages of the antisense technology for investigation into the brain neurotransmitter receptor function in regulation of behaviour, are discussed.  相似文献   

5.
The genetic architecture of resistance   总被引:13,自引:0,他引:13  
Plant resistance genes (R genes), especially the nucleotide binding site leucine-rich repeat (NBS-LRR) family of sequences, have been extensively studied in terms of structural organization, sequence evolution and genome distribution. These studies indicate that NBS-LRR sequences can be split into two related groups that have distinct amino-acid motif organizations, evolutionary histories and signal transduction pathways. One NBS-LRR group, characterized by the presence of a Toll/interleukin receptor domain at the amino-terminal end, seems to be absent from the Poaceae. Phylogenetic analysis suggests that a small number of NBS-LRR sequences existed among ancient Angiosperms and that these ancestral sequences diversified after the separation into distinct taxonomic families. There are probably hundreds, perhaps thousands, of NBS-LRR sequences and other types of R gene-like sequences within a typical plant genome. These sequences frequently reside in 'mega-clusters' consisting of smaller clusters with several members each, all localized within a few million base pairs of one another. The organization of R-gene clusters highlights a tension between diversifying and conservative selection that may be relevant to gene families that are unrelated to disease resistance.  相似文献   

6.
Evidence has accumulated to support a model for odorant detection in which individual olfactory receptor neurons (ORNs) express one of a large family of G protein-coupled receptor proteins that are activated by a small number of closely related volatile chemicals. However, the issue of whether an individual ORN expresses one or multiple types of receptor proteins has yet to be definitively addressed. Physiological data indicate that some individual ORNs can be activated by odorants differing substantially in structure and/or perceived quality, suggesting multiple receptors or one nonspecific receptor per cell. In contrast, molecular biological studies favor a scheme with a single, fairly selective receptor per cell. The present studies directly assessed whether individual rat ORNs can express multiple receptors using single-cell PCR techniques with degenerate primers designed to amplify a wide variety of receptor sequences. We found that whereas only a single OR sequence was obtained from most ORNs examined, one ORN produced two distinct receptor sequences that represented different receptor gene families. Double-label in situ hybridization studies indicated that a subset of ORNs co-express two distinct receptor mRNAs. A laminar segregation analysis of the cell nuclei of ORNs labeled with the two OR mRNA probes showed that for one probe, the histogram of the distribution of the cell nuclei along the depth of the epithelium was bimodal, with one peak overlapping the (unimodal) histogram for the other probe. These results are consistent with co-expression of two OR mRNAs in a population of single ORNs.  相似文献   

7.
In order to evaluate the role of inherited variation in the estrogen receptor (ESR1) gene in human breast cancer, we determined intronic sequences flanking each ESRI exon; identified multiple SNPs and length polymorphisms in the ESR1 coding sequence, splice junctions and regulatory regions; and genotyped families at high risk of breast cancer and population-based breast cancer patients and controls. Of 10 polymorphic sites in ESR1, four are synonymous SNPs, two are nonsynonymous SNPs and four are length polymorphisms; five are novel. No ESR1 polymorphisms were associated with breast cancer, either in the high-risk families or the case-control study. We therefore conclude that inherited genetic variation is not a mechanism by which the estrogen receptor is commonly involved in breast cancer development.  相似文献   

8.
A novel complex mutation with the presence of both deletion and insertion in very close proximity in the same region was detected in exon 8 of the LDL receptor gene from two apparently unrelated Japanese families with familial hypercholesterolemia (FH). In this mutant LDL receptor gene, the nine bases from nucleotide (nt) 1115 to nt 1123 (AGGGTGGCT) were replaced by six different bases (CACTGA), and consequently the four amino acids from codon 351 to 354, Glu-Gly-Gly-Tyr, were replaced by three amino acids, Ala-Leu-Asn, in the conserved amino acid region of the growth factor repeat B of the LDL receptor. The nature of the amino acid substitution and data on the families suggest that this mutation is very likely to affect the LDL receptor function and cause FH. The generation of this complex mutation can be explained by the simultaneous occurrence of deletion and insertion through the formation of a hairpin-loop structure mediated by inverted repeat sequences. Thus this mutation supports the hypothesis that inverted repeat sequences influence the stability of a given gene and promote human gene mutations.  相似文献   

9.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

10.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

11.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

12.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

13.
In mammals, the vomeronasal organ (VNO) contains chemosensory receptor cells that bind to pheromones and induce a variety of social and reproductive behaviors. It has been traditionally assumed that the human VNO (Jacobson's organ) is a vestigial structure, although recent studies have shown minor evidence for a structurally intact and possibly functional VNO. The presence and function of the human VNO remains controversial, however, as pheromones and VNO receptors have not been well characterized. In this study we screened a human Bacterial Artificial Chromosome (BAC) library with multiple primer sets designed from human cDNA sequences homologous to mouse VNO receptor genes. Utilizing these BAC sequences in addition to mouse VNO receptor sequences, we screened the High Throughput Genome Sequence (HTGS) database to find additional human putative VNO receptor genes. We report the identification of 56 BACs carrying 34 distinct putative VNO receptor gene sequences, all of which appear to be pseudogenes. Sequence analysis indicates substantial homology to mouse V1R and V2R VNO receptor families. Furthermore, chromosomal localization via FISH analysis and RH mapping reveal that the majority of the BACs are localized to telomeric and centromeric chromosomal localizations and may have arisen through duplication events. These data yield insight into the present state of pheromonal olfaction in humans and into the evolutionary history of human VNO receptors.  相似文献   

14.
We describe here a rapid and efficient method for the targeted isolation of specific members of gene families without the need for cloning. Using this strategy we isolated full length cDNAs for eight putative G-protein coupled neurotransmitter receptors (GPCnR) from the cattle tick Rhipicephalus (Boophilus) microplus. Gene specific degenerate primers were designed using aligned amino acid sequences of similar receptor types from several insect and arachnid species. These primers were used to amplify and sequence a section of the target gene. Rapid amplification of cDNA ends (RACE) PCR was used to generate full length cDNA sequences. Phylogenetic analysis placed 7 of these sequences into Class A G-protein coupled receptors (GPCR) (Rm_α2AOR, Rm_β2AOR, Rm_Dop1R, Rm_Dop2R, Rm_INDR, Rm_5-HT(7)R and Rm_mAchR), and one into Class C GPCR (Rm_GABA(B)R). Of the 7 Class A sequences, only Rm_mAchR is not a member of the biogenic amine receptor family. The isolation of these putative receptor sequences provides an opportunity to gain an understanding of acaricide resistance mechanisms such as amitraz resistance and might suggest possibilities for the development of new acaricides.  相似文献   

15.
FGFs (fibroblast growth factors) play major roles in a number of developmental processes. Recent studies of several human disorders, and concurrent analysis of gene knock-out and properties of the corresponding recombinant proteins have shown that FGFs and their receptors are prominently involved in the development of the skeletal system in mammals. We have compared the sequences of the nine known mammalian FGFs, FGFs from other vertebrates, and three additional sequences that we extracted from existing databases: two human FGF sequences that we tentatively designated FGF10 and FGF11, and an FGF sequence from C?norhabditis elegans. Similarly, we have compared the sequences of the four FGF receptor paralogs found in chordates with four non-chordate FGF receptors, including one recently identified in C. elegans. The comparison of FGF and FGF receptor sequences in vertebrates and nonvertebrates shows that the FGF and FGF receptor families have evolved through phases of gene duplications, one of which may have coincided with the emergence of vertebrates, in relation with their new system of body scaffold. Received: 6 April 1996 / Accepted: 5 July 1996  相似文献   

16.
Lee D  Grant A  Marsden RL  Orengo C 《Proteins》2005,59(3):603-615
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.  相似文献   

17.
Sequence conservation in Alu evolution   总被引:25,自引:8,他引:17       下载免费PDF全文
A statistical analysis of a set of genomic human Alu elements is based on a published alignment and a recent classification of these sequences. After separation of the Alu sequences into families, the consensus sequences of these families are determined, using the correct weighting of the unidirectional decay of CG-dinucleotides. For, the tenfold greater mutation rate at CG's requires separate consideration of an independent clock at every stage of analysis. The distributions of the substitutions with respect to the new consensus sequences, taking the CG and the non-CG-nucleotide positions separately, lie far closer to the expected distributions than the total diversity. Computer analysis of the folding of RNAs derived from these sequences indicates that RNA secondary structure is conserved among Alu families, suggesting its importance for Alu proliferation and/or function. The folding pattern, further substantiated by a number of compensatory mutations, includes secondary structure domains which are homologous to those observed in 7SL RNA and a defined region of interaction between the two Alu subunits. These results are consistent with a model in which a small number of conserved Alu master genes give rise via retroposition to the numerous copies of Alu pseudogenes, that then diversify by random substitution. The master genes appeared at different periods during evolution giving rise to different families of Alu sequences.  相似文献   

18.
Cloning and linkage mapping of resistance gene homologues in apple   总被引:8,自引:0,他引:8  
Apple (Malus x domestica Borkh.) sequences sharing homology with known resistance genes were cloned using a PCR-based approach with degenerate oligonucleotide primers designed on conserved regions of the nucleotide-binding site (NBS). Sequence analysis of the amplified fragments indicated the presence of at least 27 families of NBS-containing genes in apple, each composed of several very similar or nearly identical sequences. The NBS-leucine-rich repeat homologues appeared to include members of the two major groups that have been described in dicot plants: one possessing a toll-interleukin receptor element and one lacking such a domain. Genetic mapping of the cloned sequences was achieved through the development of CAPS and SSCP markers using a segregating population of a cross between the two apple cultivars Fiesta and Discovery. Several of the apple resistance gene homologues mapped in the vicinity, or at least on the same linkage group, of known loci controlling resistance to various pathogens. The utility of resistance gene-homologue sequences as molecular markers for breeding purposes and for gene cloning is discussed.Communicated by H. Nybom  相似文献   

19.
Cell surface protein receptors in oral streptococci   总被引:19,自引:0,他引:19  
Abstract Streptococci have a vast repertoire of adherence properties which include binding to human tissue components, epithelial cells and to other bacterial cells. These interactions are determined by the expression of cell-surface receptors some of which are species-specific. In the oral streptococci, two families of surface protein receptors with highly conserved amino acid sequences have been identified. The antigen I/II family of polypeptides are wall-associated high molecular mass proteins (158–166 kDa) with several binding functions that may be attributed to different domains of the receptor molecules. The LraI family of polypeptides are surface-associated lipoproteins (32–33 kDa) involved in adherence of streptococci to salivary glycoprotein pellicle and to oral Actinomyces . A region of amino acid sequence similarity is evident amongst members of the two protein families in Streptococcus gordonii . Ligand-binding specificities of these receptor polypeptides may account for species-specific adherence and site-directed colonization of streptococci within the human oral cavity.  相似文献   

20.
Sun H  Kondo R  Shima A  Naruse K  Hori H  Chigusa SI 《Gene》1999,231(1-2):137-145
To obtain an understanding of the origin, diversification and genomic organization of vertebrate olfactory receptor genes, we have newly cloned and characterized putative olfactory receptor genes, mfOR1, mfOR2, mfOR3 and mfOR4 from the genomic DNA of medaka fish (Oryzias latipes). The four sequences contained features commonly seen in known olfactory receptor genes and were phylogenetically most closely related to those of catfish and zebrafish.Among them, mfOR1 and mfOR2 showed the highest amino acid (aa) similarity (93%) and defined a novel olfactory receptor gene family that is most divergent among all other vertebrate olfactory receptor genes. Southern hybridization analyses suggested that mfOR1 and mfOR2 are tightly linked to each other (within 24kb), although suitable marker genes were not available to locate their linkage group. Unlike observation in catfish olfactory receptor sequences, nucleotide (nt) substitutions between the two sequences did not show any evidence of positive natural selection. mfOR3 and mfOR4, however, showed a much lower aa similarity (26%) and were both mapped to a region in the medaka linkage group XX.After including these medaka fish sequences, olfactory receptors of terrestrial and aquatic animals formed significantly different clusters in the phylogenetic tree. Although the member genes of each olfactory receptor gene subfamily are less in fish than that in mammals, fish seem to have maintained more diverse olfactory receptor gene families. Our finding of a novel olfactory receptor gene family in medaka fish may provide a step towards understanding the emergence of the olfactory receptor gene in vertebrates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号