首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Genome Warehouse (GWH) is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission, storage, release, and sharing. As one of the core resources in the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB; https://ngdc.cncb.ac.cn), GWH accepts both full and partial (chloroplast, mitochondrion, and plasmid) genome sequences with different assembly levels, as well as an update of existing genome assemblies. For each assembly, GWH collects detailed genome-related metadata of biological project, biological sample, and genome assembly, in addition to genome sequence and annotation. To archive high-quality genome sequences and annotations, GWH is equipped with a uniform and standardized procedure for quality control. Besides basic browse and search functionalities, all released genome sequences and annotations can be visualized with JBrowse. By May 21, 2021, GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them. Collectively, GWH serves as an important resource for genome-scale data management and provides free and publicly accessible data to support research activities throughout the world. GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.  相似文献   

2.
With the rapid development of sequencing technologies towards higher throughput and lower cost, sequence data are generated at an unprecedentedly explosive rate. To provide an efficient and easy-to-use platform for managing huge sequence data, here we present Genome Sequence Archive(GSA; http://bigd.big.ac.cn/gsa or http://gsa.big.ac.cn), a data repository for archiving raw sequence data. In compliance with data standards and structures of the International Nucleotide Sequence Database Collaboration(INSDC), GSA adopts four data objects(Bio Project, Bio Sample,Experiment, and Run) for data organization, accepts raw sequence reads produced by a variety of sequencing platforms, stores both sequence reads and metadata submitted from all over the world,and makes all these data publicly available to worldwide scientific communities. In the era of big data, GSA is not only an important complement to existing INSDC members by alleviating the increasing burdens of handling sequence data deluge, but also takes the significant responsibility for global big data archive and provides free unrestricted access to all publicly available data in support of research activities throughout the world.  相似文献   

3.
KaKs_Calculator 3.0 is an updated toolkit that is capable of calculating selective pressure on both coding and non-coding sequences. Similar to the nonsynonymous/synonymous substitution rate ratio for coding sequences, selection on non-coding sequences can be quantified as the ratio of non-coding nucleotide substitution rate to synonymous substitution rate of adjacent coding sequences. As testified on empirical data, KaKs_Calculator 3.0 shows effectiveness to detect the strength and mode of sele...  相似文献   

4.
On January 22, 2020, China National Center for Bioinformation (CNCB) released the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access information resource for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by an automated in-house pipeline. Of particular note, 2019nCoVR offers systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and their detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. Spatiotemporal change for each variant can be visualized and historical viral haplotype network maps for the course of the outbreak are also generated based on all complete and high-quality genomes available. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on the coronavirus disease 2019 (COVID-19), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with NCBI. Collectively, SARS-CoV-2 is updated daily to collect the latest information on genome sequences, variants, haplotypes, and literature for a timely reflection, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.  相似文献   

5.
Despite the huge impact of data resources in genomics and structural biology, until now there has been no central archive for biological data for all imaging modalities. The BioImage Archive is a new data resource at the European Bioinformatics Institute (EMBL-EBI) designed to fill this gap. In its initial development BioImage Archive accepts bioimaging data associated with publications, in any format, from any imaging modality from the molecular to the organism scale, excluding medical imaging. The BioImage Archive will ensure reproducibility of published studies that derive results from image data and reduce duplication of effort. Most importantly, the BioImage Archive will help scientists to generate new insights through reuse of existing data to answer new biological questions, and provision of training, testing and benchmarking data for development of tools for image analysis. The archive is available at https://www.ebi.ac.uk/bioimage-archive/.  相似文献   

6.
Next-generation sequencing has yielded a vast amount of cattle genomic data for global characterization of population genetic diversity and identification of genomic regions under natural and artificial selection. However, efficient storage, querying, and visualization of such large datasets remain challenging. Here, we developed a comprehensive database, the Bovine Genome Variation Database (BGVD). It provides six main functionalities: gene search, variation search, genomic signature search, Genome Browser, alignment search tools, and the genome coordinate conversion tool. BGVD contains information on genomic variations comprising ~60.44 M SNPs, ~6.86 M indels, 76,634 CNV regions, and signatures of selective sweeps in 432 samples from modern cattle worldwide. Users can quickly retrieve distribution patterns of these variations for 54 cattle breeds through an interactive source of breed origin map, using a given gene symbol or genomic region for any of the three versions of the bovine reference genomes (ARS-UCD1.2, UMD3.1.1, and Btau 5.0.1). Signals of selection sweep are displayed as Manhattan plots and Genome Browser tracks. To further investigate and visualize the relationships between variants and signatures of selection, the Genome Browser integrates all variations, selection data, and resources, from NCBI, the UCSC Genome Browser, and Animal QTLdb. Collectively, all these features make the BGVD a useful archive for in-depth data mining and analyses of cattle biology and cattle breeding on a global scale. BGVD is publicly available at http://animal.nwsuaf.edu.cn/BosVar.  相似文献   

7.
DNA methylation is an important epigenetic mark that plays a vital role in gene expression and cell differentiation. The average DNA methylation level among a group of cells has been extensively documented. However, the cell-to-cell heterogeneity in DNA methylation, which reflects the differentiation of epigenetic status among cells, remains less investigated. Here we established a gold standard of the cell-to-cell heterogeneity in DNA methylation based on single-cell bisulfite sequencing (BS-seq) data. With that, we optimized a computational pipeline for estimating the heterogeneity in DNA methylation from bulk BS-seq data. We further built HeteroMeth, a database for searching, browsing, visualizing, and downloading the data for heterogeneity in DNA methylation for a total of 141 samples in humans, mice, Arabidopsis, and rice. Three genes are used as examples to illustrate the power of HeteroMeth in the identification of unique features in DNA methylation. The optimization of the computational strategy and the construction of the database in this study complement the recent experimental attempts on single-cell DNA methylomes and will facilitate the understanding of epigenetic mechanisms underlying cell differentiation and embryonic development. HeteroMeth is publicly available at http://qianlab.genetics.ac.cn/HeteroMeth.  相似文献   

8.
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies (GWAS), especially because of the large phenotypic error in field experiments. To solve this issue, multi-locus GWAS methods have been recommended. However, only a few software packages for multi-locus GWAS are available. Therefore, we developed an R software named mrMLM v4.0.2. This software integrates mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO methods developed by our lab. There are four components in mrMLM v4.0.2, including dataset input, parameter setting, software running, and result output. The fread function in data.table is used to quickly read datasets, especially big datasets, and the doParallel package is used to conduct parallel computation using multiple CPUs. In addition, the graphical user interface software mrMLM.GUI v4.0.2, built upon Shiny, is also available. To confirm the correctness of the aforementioned programs, all the methods in mrMLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets. The results confirm the superior performance of mrMLM v4.0.2 to other methods currently available. False positive rates are effectively controlled, albeit with a less stringent significance threshold. mrMLM v4.0.2 is publicly available at BioCode (https://bigd.big.ac.cn/biocode/tools/BT007077) or R (https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) as an open-source software.  相似文献   

9.
Whitmore L  Janes RW  Wallace BA 《Chirality》2006,18(6):426-429
The Protein Circular Dichroism Data Bank (PCDDB) is a new deposition data bank for validated circular dichroism spectra of biomacromolecules. Its aim is to be a resource for the structural biology and bioinformatics communities, providing open access and archiving facilities for circular dichroism and synchrotron radiation circular dichroism spectra. It is named in parallel with the Protein Data Bank (PDB), a long-existing valuable reference data bank for protein crystal and NMR structures. In this article, we discuss the design of the data bank structure and the deposition website located at http://pcddb.cryst.bbk.ac.uk. Our aim is to produce a flexible and comprehensive archive, which enables user-friendly spectral deposition and searching. In the case of a protein whose crystal structure and sequence are known, the PCDDB entry will be linked to the appropriate PDB and sequence data bank files, respectively. It is anticipated that the PCDDB will provide a readily accessible biophysical catalogue of information on folded proteins that may be of value in structural genomics programs, for quality control and archiving in industrial and academic labs, as a resource for programs developing spectroscopic structural analysis methods, and in bioinformatics studies.  相似文献   

10.
Exciting funding initiatives are emerging in Europe and the US for metabolomics data production, storage, dissemination and analysis. This is based on a rich ecosystem of resources around the world, which has been build during the past ten years, including but not limited to resources such as MassBank in Japan and the Human Metabolome Database in Canada. Now, the European Bioinformatics Institute has launched MetaboLights, a database for metabolomics experiments and the associated metadata (http://www.ebi.ac.uk/metabolights). It is the first comprehensive, cross-species, cross-platform metabolomics database maintained by one of the major open access data providers in molecular biology. In October, the European COSMOS consortium will start its work on Metabolomics data standardization, publication and dissemination workflows. The NIH in the US is establishing 6?C8 metabolomics services cores as well as a national metabolomics repository. This communication reports about MetaboLights as a new resource for Metabolomics research, summarises the related developments and outlines how they may consolidate the knowledge management in this third large omics field next to proteomics and genomics.  相似文献   

11.

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS.  相似文献   

12.
The Protein Data Bank   总被引:183,自引:20,他引:163  
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.  相似文献   

13.
Current pharmacogenetic studies have obtained many genetic models that can predict the therapeutic efficacy of anticancer drugs. Although some of these models are of crucial importance and have been used in clinical practice, these very valuable models have not been well adopted into cancer research to promote the development of cancer therapies due to the lack of integration and standards for the existing data of the pharmacogenetic studies. For this purpose, we built a resource investigating genetic model of drug response (iGMDR), which integrates the models from in vitro and in vivo pharmacogenetic studies with different omics data from a variety of technical systems. In this study, we introduced a standardized process for all integrations, and described how users can utilize these models to gain insights into cancer. iGMDR is freely accessible at https://igmdr.modellab.cn.  相似文献   

14.
Tumor-specific neoantigens have attracted much attention since they can be used as biomarkers to predict therapeutic effects of immune checkpoint blockade therapy and as potential targets for cancer immunotherapy. In this study, we developed a comprehensive tumor-specific neoantigen database (TSNAdb v1.0), based on pan-cancer immunogenomic analyses of somatic mutation data and human leukocyte antigen (HLA) allele information for 16 tumor types with 7748 tumor samples from The Cancer Genome Atlas (TCGA) and The Cancer Immunome Atlas (TCIA). We predicted binding affinities between mutant/wild-type peptides and HLA class I molecules by NetMHCpan v2.8/v4.0, and presented detailed information of 3,707,562/1,146,961 potential neoantigens generated by somatic mutations of all tumor samples. Moreover, we employed recurrent mutations in combination with highly frequent HLA alleles to predict potential shared neoantigens across tumor patients, which would facilitate the discovery of putative targets for neoantigen-based cancer immunotherapy. TSNAdb is freely available at http://biopharm.zju.edu.cn/tsnadb.  相似文献   

15.
16.
Ancestry composition of populations and individuals has been extensively investigated in recent years due to advances in the genotyping and sequencing technologies. As the number of populations and individuals used for ancestry inference increases remarkably, say more than 100 populations or 1000 individuals, it is usually challenging to present the ancestry composition in a traditional way using a rectangular graph. To address this issue, we developed a program, AncestryPainter, which can illustrate the ancestry composition of populations and individuals with a rounded and nice-looking graph to save space. Individuals are depicted as length-fixed bars partitioned into colored segments representing different ancestries, and the population of interest can be highlighted as a pie chart in the center of the circle plot. In addition, AncestryPainter can also be applied to display personal ancestry in a way similar to that for displaying population ancestry. AncestryPainter is publicly available at http://www.picb.ac.cn/PGG/resource.php.  相似文献   

17.
Circular RNAs (circRNAs) from back-splicing of exon(s) have been recently identified to be broadly expressed in eukaryotes, in tissue- and species-specific manners. Although functions of most circRNAs remain elusive, some circRNAs are shown to be functional in gene expression regulation and potentially relate to diseases. Due to their stability, circRNAs can also be used as biomarkers for diagnosis. Profiling circRNAs by integrating their expression among different samples thus provides molecular basis for further functional study of circRNAs and their potential application in clinic. Here, we report CIRCpedia v2, an updated database for comprehensive circRNA annotation from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse, and download circRNAs with expression features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice. Finally, the web interface also contains computational tools to compare circRNA expression among samples. CIRCpedia v2 is accessible at http://www.picb.ac.cn/rnomics/circpedia.  相似文献   

18.
  1. Download : Download high-res image (118KB)
  2. Download : Download full-size image
Highlights
  • •MSFragger now supports raw timsTOF PASEF data.
  • •IonQuant performs fast and accurate feature detection and quantification.
  • •MSFragger and IonQuant provide excellent performance for timsTOF PASEF data.
  • •Flexibility allows for complex analyses, such as semi-enzymatic and open search.
  相似文献   

19.
Small proteins specifically refer to proteins consisting of less than 100 amino acids translated from small open reading frames (sORFs), which were usually missed in previous genome annotation. The significance of small proteins has been revealed in current years, along with the discovery of their diverse functions. However, systematic annotation of small proteins is still insufficient. SmProt was specially developed to provide valuable information on small proteins for scientific community. Here we present the update of SmProt, which emphasizes reliability of translated sORFs, genetic variants in translated sORFs, disease-specific sORF translation events or sequences, and remarkably increased data volume. More components such as non-ATG translation initiation, function, and new sources are also included. SmProt incorporated 638,958 unique small proteins curated from 3,165,229 primary records, which were computationally predicted from 419 ribosome profiling (Ribo-seq) datasets or collected from literature and other sources from 370 cell lines or tissues in 8 species (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Saccharomyces cerevisiae, Caenorhabditis elegans, and Escherichia coli). In addition, small protein families identified from human microbiomes were also collected. All datasets in SmProt are free to access, and available for browse, search, and bulk downloads at http://bigdata.ibp.ac.cn/SmProt/.  相似文献   

20.
Numerous studies on cancers, biopharmaceuticals, and clinical trials have necessitated comprehensive and precise analysis of protein O-glycosylation. However, the lack of updated and convenient databases deters the storage of and reference to emerging O-glycoprotein data. To resolve this issue, an O-glycoprotein repository named OGP was established in this work. It was constructed with a collection of O-glycoprotein data from different sources. OGP contains 9354 O-glycosylation sites and 11,633 site-specific O-glycans mapping to 2133 O-glycoproteins, and it is the largest O-glycoprotein repository thus far. Based on the recorded O-glycosylation sites, an O-glycosylation site prediction tool was developed. Moreover, an OGP-based website is already available (https://www.oglyp.org/). The website comprises four specially designed and user-friendly modules: statistical analysis, database search, site prediction, and data submission. The first version of OGP repository and the website allow users to obtain various O-glycoprotein-related information, such as protein accession Nos., O-glycosylation sites, O-glycopeptide sequences, site-specific O-glycan structures, experimental methods, and potential O-glycosylation sites. O-glycosylation data mining can be performed efficiently on this website, which will greatly facilitate related studies. In addition, the database is accessible from OGP website (https://www.oglyp.org/download.php).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号