首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.

Background  

Whole genome association studies using highly dense single nucleotide polymorphisms (SNPs) are a set of methods to identify DNA markers associated with variation in a particular complex trait of interest. One of the main outcomes from these studies is a subset of statistically significant SNPs. Finding the potential biological functions of such SNPs can be an important step towards further use in human and agricultural populations (e.g., for identifying genes related to susceptibility to complex diseases or genes playing key roles in development or performance). The current challenge is that the information holding the clues to SNP functions is distributed across many different databases. Efficient bioinformatics tools are therefore needed to seamlessly integrate up-to-date functional information on SNPs. Many web services have arisen to meet the challenge but most work only within the framework of human medical research. Although we acknowledge the importance of human research, we identify there is a need for SNP annotation tools for other organisms.  相似文献   

2.
3.

Background  

High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data.  相似文献   

4.

Background

Single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP.

Results

HaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy.

Conclusion

Thorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from http://www.bioinformatics.nl/tools/haplosnper/.  相似文献   

5.
Recent developments in sequencing methods and bioinformatics analysis tools have greatly enabled the culture-independent analysis of complex microbial communities associated with environmental samples, plants, and animals. This has led to a spectacular increase in the number of studies on both membership and functionalities of these hitherto invisible worlds, in particular those of the human microbiome. The wide variety in available microbiome tools and platforms can be overwhelming, and making sound conclusions from scientific research can be challenging. Here, I will review 1) the methodological and analytic hoops a good microbiome study has to jump through, including DNA extraction and choice of bioinformatics tools, 2) the hopes this field has generated for diseases such as autism and inflammatory bowel diseases, and 3) some of the hypes that it has created, e.g., by confusing correlation and causation, and the recent pseudoscientific commercialization of microbiome research.  相似文献   

6.
Current trends in the development of methods for monitoring, modeling and controlling biological production systems are reviewed from a bioengineering perspective. The ability to measure intracellular conditions in bioprocesses using genomics and other bioinformatics tools is addressed. Devices provided via micromachining techniques and new real-time optical technology are other novel methods that may facilitate biosystem engineering. Mathematical modeling of data obtained from bioinformatics or real-time monitoring methods are necessary in order to handle the dense flows of data that are generated. Furthermore, control methods must be able to cope with these data flows in efficient ways that can be implemented in plant-wide computer communication systems.Mini-review for the proceedings of the M3C conference  相似文献   

7.
8.
Bioinformatics is a central discipline in modern life sciences aimed at describing the complex properties of living organisms starting from large-scale data sets of cellular constituents such as genes and proteins. In order for this wealth of information to provide useful biological knowledge, databases and software tools for data collection, analysis and interpretation need to be developed. In this paper, we review recent advances in the design and implementation of bioinformatics resources devoted to the study of metals in biological systems, a research field traditionally at the heart of bioinorganic chemistry. We show how metalloproteomes can be extracted from genome sequences, how structural properties can be related to function, how databases can be implemented, and how hints on interactions can be obtained from bioinformatics.  相似文献   

9.
Flow cytometry (FCM) is an analytical tool widely used for cancer and HIV/AIDS research, and treatment, stem cell manipulation and detecting microorganisms in environmental samples. Current data standards do not capture the full scope of FCM experiments and there is a demand for software tools that can assist in the exploration and analysis of large FCM datasets. We are implementing a standardized approach to capturing, analyzing, and disseminating FCM data that will facilitate both more complex analyses and analysis of datasets that could not previously be efficiently studied. Initial work has focused on developing a community-based guideline for recording and reporting the details of FCM experiments. Open source software tools that implement this standard are being created, with an emphasis on facilitating reproducible and extensible data analyses. As well, tools for electronic collaboration will assist the integrated access and comprehension of experiments to empower users to collaborate on FCM analyses. This coordinated, joint development of bioinformatics standards and software tools for FCM data analysis has the potential to greatly facilitate both basic and clinical research--impacting a notably diverse range of medical and environmental research areas.  相似文献   

10.
Zhang W  Duan S  Dolan ME 《Bioinformation》2008,2(8):322-324
The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. The less informative SNPs including those with very low genotyping rate and SNPs with rare minor allele frequencies to some extent in one or more population are removed. Some research designs only use SNPs in a subset of HapMap cell lines. Although the HapMap website and other association software packages have provided some basic tools for optimizing these datasets, a fast and user-friendly program to generate the output for filtered genotypic data would be beneficial for association studies. Here, we present a flexible, straight-forward bioinformatics program that can be useful in preparing the HapMap genotypic data for association studies by specifying cell lines and two common filtering criteria: minor allele frequencies and genotyping rate. The software was developed for Microsoft Windows and written in C++. AVAILABILITY: The Windows executable and source code in Microsoft Visual C++ are available at Google Code (http://hapmap-filter-v1.googlecode.com/) or upon request. Their distribution is subject to GNU General Public License v3.  相似文献   

11.
Post ‘omic’ era has resulted in the development of many primary, secondary and derived databases. Many analytical and visualization bioinformatics tools have been developed to manage and analyze the data available through large sequencing projects. Availability of heterogeneous databases and tools make it difficult for researchers to access information from varied sources and run different bioinformatics tools to get desired analysis done. Building integrated bioinformatics platforms is one of the most challenging tasks that bioinformatics community is facing. Integration of various databases, tools and algorithm is a challenging problem to deal with. This article describes the bioinformatics analysis workflow management systems that are developed in the area of gene sequence analysis and phylogeny. This article will be useful for biotechnologists, molecular biologists, computer scientists and statisticians engaged in computational biology and bioinformatics research.  相似文献   

12.
MOTIVATION: Bioinformatics clustering tools are useful at all levels of proteomic data analysis. Proteomics studies can provide a wealth of information and rapidly generate large quantities of data from the analysis of biological specimens. The high dimensionality of data generated from these studies requires the development of improved bioinformatics tools for efficient and accurate data analyses. For proteome profiling of a particular system or organism, a number of specialized software tools are needed. Indeed, significant advances in the informatics and software tools necessary to support the analysis and management of these massive amounts of data are needed. Clustering algorithms based on probabilistic and Bayesian models provide an alternative to heuristic algorithms. The number of clusters (diseased and non-diseased groups) is reduced to the choice of the number of components of a mixture of underlying probability. The Bayesian approach is a tool for including information from the data to the analysis. It offers an estimation of the uncertainties of the data and the parameters involved. RESULTS: We present novel algorithms that can organize, cluster and derive meaningful patterns of expression from large-scaled proteomics experiments. We processed raw data using a graphical-based algorithm by transforming it from a real space data-expression to a complex space data-expression using discrete Fourier transformation; then we used a thresholding approach to denoise and reduce the length of each spectrum. Bayesian clustering was applied to the reconstructed data. In comparison with several other algorithms used in this study including K-means, (Kohonen self-organizing map (SOM), and linear discriminant analysis, the Bayesian-Fourier model-based approach displayed superior performances consistently, in selecting the correct model and the number of clusters, thus providing a novel approach for accurate diagnosis of the disease. Using this approach, we were able to successfully denoise proteomic spectra and reach up to a 99% total reduction of the number of peaks compared to the original data. In addition, the Bayesian-based approach generated a better classification rate in comparison with other classification algorithms. This new finding will allow us to apply the Fourier transformation for the selection of the protein profile for each sample, and to develop a novel bioinformatic strategy based on Bayesian clustering for biomarker discovery and optimal diagnosis.  相似文献   

13.
We argue the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to manage the problems endemic to the world of the small-constellations of incompatible tools for which the effort required to assemble an integrated system exceeds the perceived benefit of the integration. Where bioinformatics in-the-small is about data and tools, bioinformatics in-the-large is about metadata and dependencies. Dependencies represent the complexities of large-scale integration, including the requirements and assumptions governing the composition of tools. The popular make utility is a very effective system for defining and maintaining simple dependencies, and it offers a number of insights about the essence of bioinformatics in-the-large. Keeping an in-the-large perspective has been very useful to us in large bioinformatics projects. We give two fairly different examples, and extract lessons from them showing how it has helped. These examples both suggest the benefit of explicitly defining and managing knowledge flows and knowledge maps (which represent metadata regarding types, flows, and dependencies), and also suggest approaches for developing bioinformatics database systems. Generally, we argue that large-scale engineering principles can be successfully adapted from disciplines such as software engineering and data management, and that having an in-the-large perspective will be a key advantage in the next phase of bioinformatics development.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation.  相似文献   

15.
This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.  相似文献   

16.
识别复杂性状和疾病间遗传关联可以提供有用的病因学见解,并有助于确定可能的因果关系的优先级。尽管已有很多工具可以实现复杂性状和疾病间遗传关联,但是某些工具代码可读性差、并且不同工具基于不同的计算机语言、工具间的串联性较差。因此,本研究基于全基因组关联研究(GWAS)数据,提出了SCtool,一个开源、跨平台和用户友好的软件工具。SCtool整合了ldsc, TwosampleMR和MR-BMA三种软件,其主要功能是基于GWAS汇总水平的数据,识别复杂性状和疾病、复杂性状和复杂性状以及疾病与疾病间的遗传相关性并探究其间潜在的因果关联。最后,使用SCtool揭示了全身性铁状态(铁蛋白,血清铁,转铁蛋白,转铁蛋白饱和度)与表观遗传时钟GrimAge之间的遗传关联。  相似文献   

17.
Shen J  Deininger PL  Zhao H 《Cytokine》2006,35(1-2):62-66
Understanding the functions of single nucleotide polymorphisms (SNPs) can greatly help to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. However, how to identify functional SNPs from a pool containing both functional and neutral SNPs is challenging. In this study, we analyzed the genetic variations that can alter the expression and function of a group of cytokine proteins using computational tools. As a result, we extracted 4552 SNPs from 45 cytokine proteins from SNPper database. Of particular interest, 828 SNPs were in the 5'UTR region, 961 SNPs were in the 3' UTR region, and 85 SNPs were non-synonymous SNPs (nsSNPs), which cause amino acid change. Evolutionary conservation analysis using the SIFT tool suggested that 8 nsSNPs may disrupt the protein function. Protein structure analysis using the PolyPhen tool suggested that 5 nsSNPs might alter protein structure. Binding motif analysis using the UTResource tool suggested that 27 SNPs in 5' or 3'UTR might change protein expression levels. Our study demonstrates the presence of naturally occurring genetic variations in the cytokine proteins that may affect their expressions and functions with possible roles in complex human disease, such as immune diseases.  相似文献   

18.

Background  

As biology becomes an increasingly computational science, it is critical that we develop software tools that support not only bioinformaticians, but also bench biologists in their exploration of the vast and complex data-sets that continue to build from international genomic, proteomic, and systems-biology projects. The BioMoby interoperability system was created with the goal of facilitating the movement of data from one Web-based resource to another to fulfill the requirements of non-expert bioinformaticians. In parallel with the development of BioMoby, the European myGrid project was designing Taverna, a bioinformatics workflow design and enactment tool. Here we describe the marriage of these two projects in the form of a Taverna plug-in that provides access to many of BioMoby's features through the Taverna interface.  相似文献   

19.
The completion of the human genome project, and other genome sequencing projects, has spearheaded the emergence of the field of bioinformatics. Using computer programs to analyse DNA and protein information has become an important area of life science research and development. While it is not necessary for most life science researchers to develop specialist bioinformatic skills (including software development), basic skills in the application of common bioinformatics software and the effective interpretation of results are increasingly required by all life science researchers. Training in bioinformatics is increasingly occurring within the university system as part of existing undergraduate science and specialist degrees. One difficulty in bioinformatics education is the sheer number of software programs required in order to provide a thorough grounding in the subject to the student. Teaching requires either a well-maintained internal server with all the required software, properly interfacing with student terminals, and with sufficient capacity to handle multiple simultaneous requests, or it requires the individual installation and maintenance of every piece of software on each computer. In both cases, there are difficult issues regarding site maintenance and accessibility. In this article, we discuss the use of BioManager, a web-based bioinformatics application integrating a variety of common bioinformatics tools, for teaching, including its role as the main bioinformatics training tool in some Australian and international universities. We discuss some of the issues with using a bioinformatics resource primarily created for research in an undergraduate teaching environment.  相似文献   

20.
PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号