首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
赵佳  郭华  郭飞马 《生物信息学》2006,4(3):121-123
利用CCAP数据库和UCSC数据库检索出乳腺癌发生、发展过程有意义的BAC克隆,然后利用CGAP数据库设计更有意义的BAC克隆。结果:获得1286条BAC克隆,可用于打印CCH微阵列,进行乳腺癌的检测。  相似文献   

2.
Glycosylation is one of the most important post-translational modifications of proteins, known to be involved in pathogen recognition, innate immune response and protection of epithelial membranes. However, when compared to the tools and databases available for the processing of high-throughput proteomic data, the glycomic domain is severely lacking. While tools to assist the analysis of mass spectrometry (MS) and HPLC are continuously improving, there are few resources available to support liquid chromatography (LC)-MS/MS techniques for glycan structure profiling. Here, we present a platform for presenting oligosaccharide structures and fragment data characterized by LC-MS/MS strategies. The database is annotated with high-quality datasets and is designed to extend and reinforce those standards and ontologies developed by existing glycomics databases. AVAILABILITY: http://www.unicarb-db.org  相似文献   

3.
T J Krieger  V Y Hook 《Biochemistry》1992,31(17):4223-4231
Purification and potential tachykinin and enkephalin precursor cleaving enzymes from bovine chromaffin granules was undertaken using as substrates the model precursors 35S-(Met)-beta-preprotachykinin [35S-(Met)-beta-PPT] and 35S-(Met)-preproenkephalin [35S-(Met)-PPE]. Purification by concanavalin A-Sepharose, Sephacryl S200, and chromatofocusing resulted in a chromaffin granule aspartyl protease (CGAP) that preferred the tachykinin over the enkephalin precursor. CGAP was composed of 47-, 30-, and 16.5-kDa polypeptides migrating as a single band in a nondenaturing electrophoretic gel system, and coeluting with an apparent molecular mass of 45-55 kDa by size-exclusion chromatography. These results suggest that two forms exist: a single 47-kDa polypeptide and a complex of 30 + 16.5-kDa-associated subunits. CGAP was optimally active at pH 5.0-5.5, indicating that it would be active within the acidic intragranular environment. Cleavage at basic residues was suggested by HPLC and HVE identification of 35S-(Met)-NKA-Gly-Lys as the major acid-soluble product generated from 35S-(Met)-beta-PPT. Neuropeptide K was cleaved at a Lys-Arg basic residue site, as determined by identification of proteolytic products by microsequencing and amino acid composition analyses. Structural studies showed that the three CGAP polypeptides were similar to bovine cathepsin D in NH2-terminal sequences and amino acid compositions, indicating that CGAP appears to be a cathepsin D-related protease or cathepsin D itself. The 47- and 16.5-kDa polypeptides of CGAP possessed identical NH2-terminal sequences, suggesting that the 16.5-kDa polypeptide may be derived from the 47-kDa form by proteolysis.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

4.
High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.

Various approaches are used to analyze high-throughput phenotyping data and tools can be developed and assessed using available image-based datasets.  相似文献   

5.
With the advances of genome-wide sequencing technologies and bioinformatics approaches, a large number of datasets of normal and malignant erythropoiesis have been gener-ated and made public to researchers around the world. Collection and integration of these datasets greatly facilitate basic research and clinical diagnosis and treatment of blood disorders. Here we provide a brief introduction of the most popular omics data resources of normal and malignant hematopoiesis, including some integrated web tools, to help users get better equipped to perform common analyses. We hope this review will promote the awareness and facilitate the usage of public database resources in the hematology research.  相似文献   

6.
An intricate network of interactions between organisms and their environment form the ecosystems that sustain life on earth. With a detailed understanding of these interactions, ecologists and biologists can make better informed predictions about the ways different environmental factors will impact ecosystems. Despite the abundance of research data on biotic and abiotic interactions, no comprehensive and easily accessible data collection is available that spans taxonomic, geospatial, and temporal domains. Biotic-interaction datasets are effectively siloed, inhibiting cross-dataset comparisons. In order to pool resources and bring to light individual datasets, specialized research tools are needed to aggregate, normalize, and integrate existing datasets with standard taxonomies, ontologies, vocabularies, and structured data repositories. Global Biotic Interactions (GloBI) provides such tools by way of an open, community-driven infrastructure designed to lower the barrier for researchers to perform ecological systems analysis and modeling. GloBI provides a tool that (a) ingests, normalizes, and aggregates datasets, (b) integrates interoperable data with accepted ontologies (e.g., OBO Relations Ontology, Uberon, and Environment Ontology), vocabularies (e.g., Coastal and Marine Ecological Classification Standard), and taxonomies (e.g., Integrated Taxonomic Information System and National Center for Biotechnology Information Taxonomy Database), (c) makes data accessible through an application programming interface (API) and various data archives (Darwin Core, Turtle, and Neo4j), and (d) houses a data collection of about 700,000 species interactions across about 50,000 taxa, covering over 1100 references from 19 data sources. GloBI has taken an open-source and open-data approach in order to make integrated species-interaction data maximally accessible and to encourage users to provide feedback, contribute data, and improve data access methods. The GloBI collection of datasets is currently used in the Encyclopedia of Life (EOL) and Gulf of Mexico Species Interactions (GoMexSI).  相似文献   

7.
The number of large-scale experimental datasets generated from high-throughput technologies has grown rapidly. Biological knowledge resources such as the Gene Ontology Annotation (GOA) database, which provides high-quality functional annotation to proteins within the UniProt Knowledgebase, can play an important role in the analysis of such data. The integration of GOA with analytical tools has proved to aid the clustering, annotation and biological interpretation of such large expression datasets. GOA is also useful in the development and validation of automated annotation tools, in particular text-mining systems. The increasing interest in GOA highlights the great potential of this freely available resource to assist both the biological research and bioinformatics communities.  相似文献   

8.
Large-scale comparative and systematic studies rely on the seamless merging of multiple datasets. However, taxonomic nomenclature is constantly being revised making it problematic to combine data from different resources or different years of publication, which use different synonyms. This is certainly true for amphibians, which have experienced a spike in taxonomic revisions in part as the result of the widespread use of DNA barcoding to resolve cryptic species delimitation issues and large-scale collaborative efforts to revise the entire amphibian tree. The ‘Amphibian Species of the World Online Reference’ (ASW) is one of the most widely used and most regularly updated databases for amphibian taxonomy, but existing R tools for querying synonyms such as ‘taxize’ do not include this resource. ‘AmphiNom’ is a tool suite written in the R programming language designed to facilitate batch-querying amphibian species names against the ASW database. This facilitates the merging of datasets that use different nomenclature and its functionality is easily integrated into customizable R workflows. Moreover, it allows direct querying of the ASW website using R and straightforward reporting of summary information on current amphibian systematics.  相似文献   

9.
10.

Background

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.

Results

We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.

Conclusions

This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.  相似文献   

11.
BackgroundResearch in Bioinformatics generates tools and datasets in Bioinformatics at a very fast rate. Meanwhile, a lot of effort is going into making these resources findable and reusable to improve resource discovery by researchers in the course of their work.PurposeThis paper proposes a semi-automated tool to assess a resource according to the Findability, Accessibility, Interoperability and Reusability (FAIR) criteria. The aim is to create a portal that presents the assessment score together with a report that researchers can use to gauge a resource.MethodOur system uses internet searches to automate the process of generating FAIR scores. The process is semi-automated in that if a particular property of the FAIR scores has not been captured by AutoFAIR, a user is able to amend and supply the information to complete the assessment.ResultsWe compare our results against FAIRshake that was used as the benchmark tool for comparing the assessments. The results show that AutoFAIR was able to match the FAIR criteria in FAIRshake with minimal intervention from the user.ConclusionsWe show that AutoFAIR can be a good repository for storing metadata about tools and datasets, together with comprehensive reports detailing the assessments of the resources. Moreover, AutoFAIR is also able to score workflows, giving an overall indication of the FAIRness of the resources used in a scientific study.  相似文献   

12.
13.

Background  

Mascot™ is a commonly used protein identification program for MS as well as for tandem MS data. When analyzing huge shotgun proteomics datasets with Mascot™'s native tools, limits of computing resources are easily reached. Up to now no application has been available as open source that is capable of converting the full content of Mascot™ result files from the original MIME format into a database-compatible tabular format, allowing direct import into database management systems and efficient handling of huge datasets analyzed by Mascot™.  相似文献   

14.
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap'99, Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri-tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov.  相似文献   

15.
Database resources of the National Center for Biotechnology Information   总被引:66,自引:11,他引:55       下载免费PDF全文
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval and resources that operate on the data in GenBank and a variety of other biological data made available through NCBI's Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing pages, GeneMap'99, Davis Human-Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP) pages, Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP) pages, SAGEmap, Online Mendelian Inheritance in Man (OMIM) and the Molecular Modeling Database (MMDB). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih. gov  相似文献   

16.
Spliced alignment plays a central role in the precise identification of eukaryotic gene structures. Even though many spliced alignment programs have been developed, recent rapid progress in DNA sequencing technologies demands further improvements in software tools. Benchmarking algorithms under various conditions is an indispensable task for the development of better software; however, there is a dire lack of appropriate datasets usable for benchmarking spliced alignment programs. In this study, we have constructed two types of datasets: simulated sequence datasets and actual cross-species datasets. The datasets are designed to correspond to various real situations, i.e. divergent eukaryotic species, different types of reference sequences, and the wide divergence between query and target sequences. In addition, we have developed an extended version of our program Spaln, which incorporates two additional features to the scoring scheme of the original version, and examined this extended version, Spaln2, together with the original Spaln and other representative aligners based on our benchmark datasets. Although the effects of the modifications are not individually striking, Spaln2 is consistently most accurate and reasonably fast in most practical cases, especially for plants and fungi and for increasingly divergent pairs of target and query sequences.  相似文献   

17.
Recent developments in high-throughput sequencing technologies have generated considerable demand for tools to analyse large datasets of small RNA sequences. Here, we describe a suite of web-based tools for processing plant small RNA datasets. Our tools can be used to identify micro RNAs and their targets, compare expression levels in sRNA loci, and find putative trans-acting siRNA loci. AVAILABILITY: The tools are freely available for use at http://srna-tools.cmp.uea.ac.uk.  相似文献   

18.
Thiol and aspartyl proteolytic activities in isolated secretory vesicles of neural (NL) and intermediate (IL) lobes of bovine pituitary were characterized with heterologous enkephalin and tachykinin precursor substrates, 35S-(Met)-preproenkephalin and 35S-(Met)-beta-preprotachykinin. IL and NL secretory vesicles contained thiol-dependent proteolytic activity that cleaved the enkephalin precursor with a pH optimum of 4.5; this activity resembled a novel "prohormone thiol protease' previously purified and characterized from adrenal medulla chromaffin granules. IL and NL vesicles also demonstrated aspartyl proteolytic activity with acidic pH optimum, as shown by pepstatin A inhibition of tachykinin and enkephalin precursor cleaving activity. This activity may be related to a previously characterized chromaffin granule aspartyl protease (CGAP) related to cathepsin D (2), as indicated by the presence of immunoreactive CGAP in NL secretory vesicles by anti-CGAP immunoblots. These results show that pituitary secretory vesicles, like chromaffin granules, may contain similar thiol-dependent and aspartyl proteolytic activities.  相似文献   

19.
The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.  相似文献   

20.
The phylum Apicomplexa comprises over 5000 species of obligate intracellular parasites, many responsible for diseases that significantly impact human health and economics. To aid drug development programs, global sequencing initiatives are generating increasing numbers of apicomplexan genomes. The challenge is how best to exploit these resources to identify effective therapeutic targets. Because of its important role in growth and maintenance, much interest has centred on metabolism. However, in the absence of detailed biochemical data, reconstructing the metabolic potential from a fully sequenced genome remains problematic. In this review current resources and tools facilitating the metabolic reconstruction for apicomplexans are examined. Furthermore, how these datasets can be utilized to explore the metabolic capabilities of apicomplexans are discussed and targets for therapeutic intervention are prioritized.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号