首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Navigating public microarray databases   总被引:1,自引:0,他引:1  
With the ever-escalating amount of data being produced by genome-wide microarray studies, it is of increasing importance that these data are captured in public databases so that researchers can use this information to complement and enhance their own studies. Many groups have set up databases of expression data, ranging from large repositories, which are designed to comprehensively capture all published data, through to more specialized databases. The public repositories, such as ArrayExpress at the European Bioinformatics Institute contain complete datasets in raw format in addition to processed data, whilst the specialist databases tend to provide downstream analysis of normalized data from more focused studies and data sources. Here we provide a guide to the use of these public microarray resources.  相似文献   

3.
4.
The domesticated silkworm, Bombyx mori serves as an ideal representative of lepidopteran species for a variety of scientific studies. As a result, databases have been created to organize information pertaining to the silkworm genome that is subject to constant updating. Of these, four main databases are important for store nucleotide information in the form of genomic data, ESTs and microsatelites. These databases also store data related to other lepidoptera and important insects, which help in insect biological research. Though a considerable amount of nucleotide data is currently available, there is a paucity of data related to silkworm and other lepidopteran proteins. Hence, the focus of this article is to present the current status of nucleotide databases of silkworm, avenues for improvement and possibilities for databases that could be created in the future.  相似文献   

5.
Babnigg G  Giometti CS 《Proteomics》2006,6(16):4514-4522
In proteome studies, identification of proteins requires searching protein sequence databases. The public protein sequence databases (e.g., NCBInr, UniProt) each contain millions of entries, and private databases add thousands more. Although much of the sequence information in these databases is redundant, each database uses distinct identifiers for the identical protein sequence and often contains unique annotation information. Users of one database obtain a database-specific sequence identifier that is often difficult to reconcile with the identifiers from a different database. When multiple databases are used for searches or the databases being searched are updated frequently, interpreting the protein identifications and associated annotations can be problematic. We have developed a database of unique protein sequence identifiers called Sequence Globally Unique Identifiers (SEGUID) derived from primary protein sequences. These identifiers serve as a common link between multiple sequence databases and are resilient to annotation changes in either public or private databases throughout the lifetime of a given protein sequence. The SEGUID Database can be downloaded (http://bioinformatics.anl.gov/SEGUID/) or easily generated at any site with access to primary protein sequence databases. Since SEGUIDs are stable, predictions based on the primary sequence information (e.g., pI, Mr) can be calculated just once; we have generated approximately 500 different calculations for more than 2.5 million sequences. SEGUIDs are used to integrate MS and 2-DE data with bioinformatics information and provide the opportunity to search multiple protein sequence databases, thereby providing a higher probability of finding the most valid protein identifications.  相似文献   

6.
7.
8.
Lake trout (Salvelinus namaycush) are a top-predator species in the Laurentian Great Lakes that are often used as bioindicators of chemical stressors in the ecosystem. Although many studies are done using these fish to determine concentrations of stressors like legacy persistent, bioaccumulative and toxic chemicals, there are currently no proteomic studies on the biological effects these stressors have on the ecosystem. This lack of proteomic studies on Great Lakes lake trout is because there is currently no complete, comprehensive protein database for this species. Here, we employed proteomics approaches to develop a lake trout protein database that could aid in future research on this fish, in particular exposomics and adductomics. The current study utilized heart tissue and blood from two lake trout. Our previous work using lake trout liver revealed 4194 potential protein hits in the NCBI databases and 3811 potential protein hits in the UniProtKB databases. In the current study, using the NCBI databases we identified 838 proteins for the heart and 580 proteins for the blood tissues in the biological replicate 1 (BR1) and 1180 potential protein hits for the heart and 561 potential protein hits for the blood in BR2. Similar results were obtained using the UniProtKB databases. This study builds on our previous work by continuing to build the first comprehensive lake trout protein database and provides insight into protein homology through evolutionary relationships. This data is available via the PRIDE partner repository with the dataset identifier PXD023970.  相似文献   

9.
人类蛋白质组表达谱蛋白质鉴定的分步搜索策略   总被引:3,自引:0,他引:3  
吴松锋  朱云平  贺福初 《遗传》2005,27(5):687-693
大规模蛋白质组表达谱研究的蛋白质鉴定一般采取基于数据库搜索的策略,因此数据库的选择及搜索策略在蛋白质鉴定中非常重要。现有的人类蛋白质数据库远不够完善,而从其他物种的蛋白质数据库中所能得到的补充非常有限,但人类基因组数据库中却可能含有很大的补充空间。在对国际人类蛋白质数据库充分调研、比较的基础上,提出了一种分步搜索的策略。这种策略首先利用一个质量较高、覆盖率相对较大的非冗余数据库进行基本鉴定,随后利用其他蛋白和核酸数据库进行补充鉴定和新蛋白挖掘。该策略能有效地鉴定尽可能多的高可靠蛋白,并能进一步充分利用质谱数据进行补充鉴定和新蛋白挖掘,对大规模蛋白质组表达谱研究具有重要的意义。  相似文献   

10.
MOTIVATION: Sequence databases represent an enormous resource of phylogenetic information, but there is a lack of tools for accessing that information in order to assess the amount of evolutionary information in these databases that may be suitable for phylogenetic reconstruction and for identifying areas of the taxonomy that are under-represented for specific gene sequences. RESULTS: We have developed TreeGeneBrowser which allows inspection and evaluation of gene sequence data for phylogenetic reconstruction. This program improves the efficiency of identification of genes that may be useful for particular phylogenetic studies and identifies taxa and taxonomic branches that are under-represented in sequence databases.  相似文献   

11.

Background  

Large databases of single nucleotide polymorphisms (SNPs) are available for use in genomics studies. Typically, investigators must choose a subset of SNPs from these databases to employ in their studies. The choice of subset is influenced by many factors, including estimated or known reliability of the SNP, biochemical factors, intellectual property, cost, and effectiveness of the subset for mapping genes or identifying disease loci. We present an evolutionary algorithm for multiobjective SNP selection.  相似文献   

12.
Susceptibility to genetically complex disorders is determined by an unknown number of genetic determinants, and decades of intensive research have yielded hundreds of such potential susceptibility loci for Alzheimer’s disease (AD), Parkinson’s disease (PD), schizophrenia (SZ), and multiple sclerosis (MS). The results of genome-wide association studies are now adding to an already vast and complicated body of data. To facilitate the evaluation and interpretation of these findings, we have recently created online databases for genetic association studies in AD, PD, SZ, and MS. In addition to providing detailed summaries for each eligible study, the databases present the results of allele-based meta-analyses for all polymorphisms with sufficient genotype data. In this review, we discuss the background and implications of the database approach developed by our group, using current findings from the AD (AlzGene) and PD (PDGene) databases as examples.  相似文献   

13.
Science is a social process with far-reaching impact on our modern society. In recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies.  相似文献   

14.

Background

Public SNP databases are frequently used to choose SNPs for candidate genes in the association and linkage studies of complex disorders. However, their utility for such studies of diseases with ethnic-dependent background has never been evaluated.

Results

To estimate the accuracy and completeness of SNP public databases, we analyzed the allele frequencies of 41 SNPs in 10 candidate genes for obesity and/or osteoporosis in a large American-Caucasian sample (1,873 individuals from 405 nuclear families) by PCR-invader assay. We compared our results with those from the databases and other published studies. Of the 41 SNPs, 8 were monomorphic in our sample. Twelve were reported for the first time for Caucasians and the other 29 SNPs in our sample essentially confirmed the respective allele frequencies for Caucasians in the databases and previous studies. The comparison of our data with other ethnic groups showed significant differentiation between the three major world ethnic groups at some SNPs (Caucasians and Africans differed at 3 of the 18 shared SNPs, and Caucasians and Asians differed at 13 of the 22 shared SNPs). This genetic differentiation may have an important implication for studying the well-known ethnic differences in the prevalence of obesity and osteoporosis, and complex disorders in general.

Conclusion

A comparative analysis of the SNP data of the candidate genes obtained in the present study, as well as those retrieved from the public domain, suggests that the databases may currently have serious limitations for studying complex disorders with an ethnic-dependent background due to the incomplete and uneven representation of the candidate SNPs in the databases for the major ethnic groups. This conclusion attests to the imperative necessity of large-scale and accurate characterization of these SNPs in different ethnic groups.  相似文献   

15.
The influenza viruses contain highly variable genomes and are able to infect a wide range of host species. Large-scale sequencing projects have collected abundant influenza sequence data for assessing influenza genome diversity and evolution. This work reviews current influenza sequence databases characteristics and statistics, as well as recent studies utilizing these databases to unravel influenza virus diversity and evolution. Also discussed are the newest deep sequencing methods and their applications to influenza virus research.  相似文献   

16.
Nucleotide sequence databases: a gold mine for biologists.   总被引:5,自引:0,他引:5  
  相似文献   

17.
Roeder K  Luca D 《Genomics》2009,93(1):1-4
Data for genome-wide association studies are being collected for a myriad of phenotypes. Many of these studies do not include control samples selected to reflect ancestry similar to the case samples. At the same time "control databases" are becoming available to be utilized as a common resource. These data are often genotyped using a large-scale SNP array. Human populations exhibit complex structure that can lead to spurious associations if not properly handled. How to couple case and control databases effectively is a pressing question. We review available methods for modeling genetic ancestry based on the information gleaned from the SNP array. Methods for selecting control samples with genetic ancestry similar to the case samples are described.  相似文献   

18.
目的统计分析国内外已发表的与双歧杆菌相关的期刊文献,了解其中的研究热点与发展趋势,为相关科研工作者提供参考。方法研究资料来源于CNKI和PubMed数据库,应用文献计量学的方法对两个数据库所收录的双歧杆菌相关文献进行分析。结果截至2018年8月15日,CNKI和PubMed分别收录了9277和5130篇相关文献。在数量上,国内外对双歧杆菌的研究都从20世纪90年代开始快速增长。从学科分布和关键词来看,中外研究共同关注了消化道疾病和儿科学,同时中外研究侧重点又有明显的差异。国内研究机构参与发表在国际杂志上的文章数量较多,但是其中作为第一发表单位的文章数量偏少。结论双歧杆菌相关的基础与应用研究还有许多需要深入和拓展的方面,研究者应当在相关领域进行实质性突破。  相似文献   

19.
One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene–gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available.  相似文献   

20.
《PLoS biology》2012,10(12)
Extracellular vesicles (EVs) are membraneous vesicles released by a variety of cells into their microenvironment. Recent studies have elucidated the role of EVs in intercellular communication, pathogenesis, drug, vaccine and gene-vector delivery, and as possible reservoirs of biomarkers. These findings have generated immense interest, along with an exponential increase in molecular data pertaining to EVs. Here, we describe Vesiclepedia, a manually curated compendium of molecular data (lipid, RNA, and protein) identified in different classes of EVs from more than 300 independent studies published over the past several years. Even though databases are indispensable resources for the scientific community, recent studies have shown that more than 50% of the databases are not regularly updated. In addition, more than 20% of the database links are inactive. To prevent such database and link decay, we have initiated a continuous community annotation project with the active involvement of EV researchers. The EV research community can set a gold standard in data sharing with Vesiclepedia, which could evolve as a primary resource for the field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号