首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.  相似文献   

2.
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.  相似文献   

3.
A new variant of concern for SARS-CoV-2,Omicron(B.1.1.529),was designated by the World Health Organization on November 26,2021.This study analyzed the viral genome sequencing data of 108 samples collected from patients infected with Omicron.First,we found that the enrichment efficiency of viral nucleic acids was reduced due to mutations in the region where the primers anneal to.Second,the Omicron variant possesses an excessive number of mutations compared to other variants circulating at the sam...  相似文献   

4.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

5.
The Signal Recognition Particle Database (SRPDB).   总被引:1,自引:0,他引:1       下载免费PDF全文
The signal recognition particle database (SRPDB) is located at the University of Texas Health Science Center at Tyler and includes tabulations of SRP RNA, SRP protein and SRP receptor sequences. The sequences are annotated with links to the primary databases. They are ordered alphabetically or phylogenetically and are available in aligned form. As of September, 1998, there were 108 SRP RNA sequences, 83 SRP protein sequences and 28 sequences of the SRP receptor alpha subunit and its homologues. In addition, the SRPDB provides search motifs consisting of conserved amino acid and nucleotide residues, and a limited number of SRP RNA secondary structure diagrams and 3-D models. The data are available freely at the URL http://psyche.uthct.edu/dbs/SRPDB/SRPDB.++ +html  相似文献   

6.
Levels of oxygen can vary dramatically in aquatic environments. Aquatic organisms, including fishes, have adapted accordingly to survive. As there are both phylogenetically closely related fish species with differing oxygen requirements and distantly related species with similar oxygen requirements, fishes are good candidates for examining oxygen-related functions in vertebrates. We set out to investigate if sequence variation in the hypoxia-inducible factor-1 alpha (HIF-1α) gene is associated with variations in oxygen requirements. Since the teleost HIF-1α sequences available in databases represent a very limited dataset both phylogenetically and with regard to oxygen requirements, we have sequenced the protein coding sequence for HIF-1α from an additional 9 fish species. Our results indicate that the deduced HIF-1α proteins of teleost fishes are somewhat shorter than those of tetrapods. Additionally, the results suggest that tetrapod sequences more closely resemble the ancestral form of the protein than do teleost sequences. No clear signatures which could be associated with the oxygen requirements of the species were found. This study suggests that if species-specific differences in HIF-1α function with regards to oxygen dependence have evolved, they do not occur in the protein coding sequence but at other levels of the HIF-1α pathway.  相似文献   

7.
MOTIVATION: The amount of genomic and proteomic data that is published daily in the scientific literature is outstripping the ability of experimental scientists to stay current. Reviews, the traditional medium for collating published observations, are also unable to keep pace. For some specific classes of information (e.g. sequences and protein structures), obligatory data deposition policies have helped. However, a great deal of other valuable information is spread throughout the literature hindering coherent access. We are involved in the Molecular Class-Specific Information System (MCSIS) project, a collaborative effort to design and automate the maintenance of protein family databases. The first two databases, the GPCRDB and NucleaRDB, are focused on G protein-coupled receptors (GPCRs) and nuclear hormone receptors (NRs), respectively. The main aim of the MCSIS project is to gather heterogeneous data from across a variety of electronic and literature sources in order to draw new inferences about the target protein families. RESULTS: We present a computational method that identifies and extracts mutation data from the scientific literature. We focused on the extraction of single point mutations for the GPCR and NR superfamilies. After validation by plausibility filters, the mutation data is integrated into the corresponding MCSIS where it is combined with structural and sequence information already stored in these databases. We extracted and validated 2736 true point mutations from 914 articles on GPCRs and 785 true point mutations from 1094 articles on NRs. The current version of our automated extraction algorithm identifies 49.3% of the GPCR point mutations with a specificity of 87.9%, and 64.5% of the NR point mutations with a specificity of 85.8%. MuteXt routinely analyzes 100 electronic articles in approximately 1 h.  相似文献   

8.
Mutations of the tyrosinase gene associated with a partial or complete loss of enzymatic activity are responsible for tyrosinase related oculocutaneous albinism (OCA1). A large number of mutations have been identified and their analysis has provided in-sight into the biology of tyrosinase and the pathogenesis of these different mutations. Missense mutations produce their effect on the activity of an enzyme by altering an amino acid at a specific site. The location of these mutations in the peptide can be used to indicate potential domains important for enzymatic activity. Missense mutations of the tyrosinase polypeptide cluster in four regions, suggesting that these are important functional domains. Two of the potential domains involve the copper binding sites while the others are likely involved in substrate binding. More critical analysis of the copper binding domain of tyrosinase can be gained by analyzing the structure of hemocyanin, a copper-binding protein with a high degree of homology to tyrosinase in the copper binding region. This analysis indicates a single catalytic site in tyrosinase for all enzymatic activities.  相似文献   

9.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

10.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

11.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.  相似文献   

12.
Two subunits of protein phosphatase 2A (PP2A) have been shown previously to bind to the small t and middle T antigens (ST and MT, respectively) of polyomavirus. To determine sequences important for binding of PP2A to ST and MT, we first constructed a series of ST mutants in regions known to be important for biological activity of ST and MT. Several mutations in two small regions just amino terminal to the Cys-X-Cys-X-X-Cys motifs of ST and MT abolished PP2A binding to ST in vitro. Parallel mutations were constructed in MT to investigate the role of PP2A binding in the function of polyomavirus MT. Wild-type and mutant MT proteins were stably expressed in NIH 3T3 cells and analyzed (i) for their ability to induce transformation and (ii) for associated cellular proteins and corresponding enzymatic activities previously described as associating with wild-type MT. A number of the mutant MTs were found to be defective in binding of PP2A as assayed by coimmunoprecipitation. In contrast, a deletion of the highly conserved stretch of amino acids 42 to 47 (His-Pro-Asp-Lys-Gly-Gly) in the ST-MT-large T antigen common region did not affect PP2A binding to MT. MT mutants defective for PP2A binding were also defective in transformation, providing further evidence that association with PP2A is important for the ability of MT to transform cells. All mutants which were impaired for PP2A binding were similarly or more dramatically impaired for associated protein and lipid kinase activities, supporting the possibility that PP2A binding is necessary for the formation and/or stability of an MT-pp60c-src complex.  相似文献   

13.
We have previously identified a Trypanosoma cruzi gene encoding a protein named Tc52 sharing structural and functional properties with the thioredoxin and glutaredoxin family involved in thiol-disulfide redox reactions. Gene targeting strategy and immunological studies allowed showing that Tc52 is among T. cruzi virulence factors. Taking into account that T. cruzi has a genetic variability that might be important determinant that governs the different behaviour of T. cruzi clones in vitro and in vivo, we thought it was of interest to analyse the sequence polymorphism of Tc52 gene in several reference clones. The DNA sequences of 12 clones which represent the whole genetic diversity of T. cruzi allowed showing that 40 amino-acid positions over 400 analysed are targets for mutations. A number of residues corresponding to putative amino-acids playing a role in GSH binding and/or enzymatic function and others located nearby are subject to mutations. Although the immunological analysis showed that Tc52 is present in parasite extracts from different clones, it is possible that the amino-acid differences could affect the enzymatic and/or the immunomodulatory function of Tc52 variants and therefore the parasite phenotype.  相似文献   

14.
Histone Sequence Database: new histone fold family members.   总被引:2,自引:0,他引:2       下载免费PDF全文
Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih. gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih. gov/Baxevani/HISTONES  相似文献   

15.
As the largest fraction of any proteome does not carry out enzymatic functions, and in order to leverage 3D structural data for the annotation of increasingly higher volumes of sequence data, we wanted to assess the strength of the link between coarse grained structural data (i.e., homologous superfamily level) and the enzymatic versus non-enzymatic nature of protein sequences. To probe this relationship, we took advantage of 41 phylogenetically diverse (encompassing 11 distinct phyla) genomes recently sequenced within the GEBA initiative, for which we integrated structural information, as defined by CATH, with enzyme level information, as defined by Enzyme Commission (EC) numbers. This analysis revealed that only a very small fraction (about 1%) of domain sequences occurring in the analyzed genomes was found to be associated with homologous superfamilies strongly indicative of enzymatic function. Resorting to less stringent criteria to define enzyme versus non-enzyme biased structural classes or excluding highly prevalent folds from the analysis had only modest effect on this proportion. Thus, the low genomic coverage by structurally anchored protein domains strongly associated to catalytic activities indicates that, on its own, the power of coarse grained structural information to infer the general property of being an enzyme is rather limited.  相似文献   

16.
The explosive growth in the number of protein sequences gives rise to the possibility of using the natural variation in sequences of homologous proteins to find residues that control different protein phenotypes. Because in many cases different phenotypes are each controlled by a group of residues, the mutations that separate one version of a phenotype from another will be correlated. Here we incorporate biological knowledge about protein phenotypes and their variability in the sequence alignment of interest into algorithms that detect correlated mutations, improving their ability to detect the residues that control those phenotypes. We demonstrate the power of this approach using simulations and recent experimental data. Applying these principles to the protein families encoded by Dscam and Protocadherin allows us to make testable predictions about the residues that dictate the specificity of molecular interactions.  相似文献   

17.
Understanding the mechanism of the protein stability change is one of the most challenging tasks. Recently, the prediction of protein stability change affected by single point mutations has become an interesting topic in molecular biology. However, it is desirable to further acquire knowledge from large databases to provide new insights into the nature of them. This paper presents an interpretable prediction tree method (named iPTREE-2) that can accurately predict changes of protein stability upon mutations from sequence based information and analyze sequence characteristics from the viewpoint of composition and order. Therefore, iPTREE-2 based on a regression tree algorithm exhibits the ability of finding important factors and developing rules for the purpose of data mining. On a dataset of 1859 different single point mutations from thermodynamic database, ProTherm, iPTREE-2 yields a correlation coefficient of 0.70 between predicted and experimental values. In the task of data mining, detailed analysis of sequences reveals the possibility of the compositional specificity of residues in different ranges of stability change and implies the existence of certain patterns. As building rules, we found that the mutation residues in wild type and in mutant protein play an important role. The present study demonstrates that iPTREE-2 can serve the purpose of predicting protein stability change, especially when one requires more understandable knowledge.  相似文献   

18.
It is well established that different sites within a protein evolve at different rates according to their role within the protein; identification of these correlated mutations can aid in tasks such as ab initio protein structure, structure function analysis or sequence alignment. Mutual Information is a standard measure for coevolution between two sites but its application is limited by signal to noise ratio. In this work we report a preliminary study to investigate whether larger sequence sets could circumvent this problem by calculating mutual information arrays for two sets of drug naïve sequences from the HIV gp120 protein for the B and C subtypes. Our results suggest that while the larger sequences sets can improve the signal to noise ratio, the gain is offset by the high mutation rate of the HIV virus which makes it more difficult to achieve consistent alignments. Nevertheless, we were able to predict a number of coevolving sites that were supported by previous experimental studies as well as a region close to the C terminal of the protein that was highly variable in the C subtype but highly conserved in the B subtype.  相似文献   

19.
A Monte Carlo simulation based sequence design method is proposed to investigate the role of site-directed point mutations in protein misfolding. Site-directed point mutations are incorporated in the designed sequences of selected proteins. While most mutated sequences correctly fold to their native conformation, some of them stabilize in other nonnative conformations and thus misfold/unfold. The results suggest that a critical number of hydrophobic amino acid residues must be present in the core of the correctly folded proteins, whereas proteins misfold/unfold if this number of hydrophobic residues falls below the critical limit. A protein can accommodate only a particular number of hydrophobic residues at the surface, provided a large number of hydrophilic residues are present at the surface and critical hydrophobicity of the core is preserved. Some surface sites are observed to be equally sensitive toward site-directed point mutations as the core sites. Point mutations with highly polar and charged amino acids increases the misfold/unfold propensity of proteins. Substitution of natural amino acids at sites with different number of nonbonded contacts suggests that both amino acid identity and its respective site-specificity determine the stability of a protein. A clash-match method is developed to calculate the number of matching and clashing interactions in the mutated protein sequences. While misfolded/unfolded sequences have a higher number of clashing and a lower number of matching interactions, the correctly folded sequences have a lower number of clashing and a higher number of matching interactions. These results are valid for different SCOP classes of proteins.  相似文献   

20.
Dal Nogare  AR; Dan  N; Lehrman  MA 《Glycobiology》1998,8(6):625-632
The UDP-GlcNAc/MurNAc family of eukaryotic and prokaryotic enzymes use UDP-GlcNAc or UDP-MurNAc-pentapeptide as donors, dolichol-P or polyprenol-P as acceptors, and generate sugar-P-P-polyisoprenols. A series of six conserved sequences, designated A through F and ranging from 5 to 13 amino acid residues, has been identified in this family. To determine whether these conserved sequences are required for enzyme function, various mutations were examined in hamster UDP- GlcNAc:dolichol-P GlcNAc-1-P transferase (GPT). Scramble mutations of sequences B-F, generated by scrambling the residues within each sequence, demonstrated that each is important in GPT. While E and F scrambles appeared to prevent stable expression of GPT, scrambling of B- D resulted in GPT mutants that could be stably expressed and bound tunicamycin, but lacked enzymatic activity. Further, the C and D scramble mutants had an unexpected sorting defect. Replacement of sequences B-F with prokaryotic counterparts from either the B.subtilis mraY or E.coli rfe genes also affected GPT by preventing expression of the mutant protein (B, F) or inhibiting its enzymatic activity (C-E). For the C-E replacements, no acquisition of acceptor activity for polyprenol-P, the fully unsaturated natural bacterial acceptor, was detected. These studies show that the conserved sequences of the UDP- GlcNAc/MurNAc family are important, and that the eukaryotic and prokaryotic counterparts are not freely interchangeable. Since several mutants were efficiently expressed and bound tunicamycin, yet lacked enzymatic activity, the data are consistent with these sequences having a direct role in product formation.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号