首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
宏基因组研究的生物信息学平台现状   总被引:2,自引:0,他引:2  
由Handelsman et al(1998)提出的宏基因组(metagenome)泛指特定环境样品(例如:人类和动物的肠道、母乳、土壤、湖泊、冰川和海洋等环境)中微生物群落所有物种的基因组。宏基因组技术起源于环境微生物学研究,而新一代高通量测序技术使其广泛应用成为可能。与基因组学研究相类似,目前宏基因组学发展的瓶颈在于如何高效分析高通量测序产生的海量数据,因此,相关的生物信息学分析方法和平台是宏基因组学研究的关键。该文介绍了目前宏基因组研究领域中主要的生物信息学软件及工具;鉴于目前宏基因组研究所采用的"全基因组测序"(whole genome sequencing)和"扩增子测序"(amplicon sequencing)两大测序方法所获得的数据和相应分析方法有较大差异,文中分别对相应软件平台进行了介绍。  相似文献   

2.
High-throughput genotyping chips have produced huge datasets for genome-wide association studies(GWAS)that have contributed greatly to discovering susceptibility genes for complex diseases.There are two strategies for performing data analysis for GWAS.One strategy is to use open-source or commercial packages that are designed for GWAS.The other is to take advantage of classic genetic programs with specific functions,such as linkage disequilibrium mapping,haplotype inference and transmission disequilibrium tests.However,most classic programs that are available are not suitable for analyzing chip data directly and require custom-made input,which results in the inconvenience of converting raw genotyping files into various data formats.We developed a powerful,user-friendly,lightweight program named SNPTransformer for GWAS that includes five major modules (Transformer,Operator,Previewer,Coder and Simulator).The toolkit not only works for transforming the genotyping files into ten input formats for use with classic genetics packages,but also carries out useful functions such as relational operations on IDs,previewing data files,recoding data formats and simulating marker files,among other functions.It bridges upstream raw genotyping data with downstream genetic programs,and can act as an in-hand toolkit for human geneticists,especially for non-programmers.SNPTransformer is freely available at http://snptransformer.sourceforge.net.  相似文献   

3.

Background

To reproduce and report a bioinformatics analysis, it is important to be able to determine the environment in which a program was run. It can also be valuable when trying to debug why different executions are giving unexpectedly different results.

Results

Log::ProgramInfo is a Perl module that writes a log file at the termination of execution of the enclosing program, to document useful execution characteristics. This log file can be used to re-create the environment in order to reproduce an earlier execution. It can also be used to compare the environments of two executions to determine whether there were any differences that might affect (or explain) their operation.

Availability

The source is available on CPAN (Macdonald and Boutros, Log-ProgramInfo. http://search.cpan.org/~boutroslb/Log-ProgramInfo/).

Conclusion

Using Log::ProgramInfo in programs creating result data for publishable research, and including the Log::ProgramInfo output log as part of the publication of that research is a valuable method to assist others to duplicate the programming environment as a precursor to validating and/or extending that research.
  相似文献   

4.

Background  

High throughput laboratory techniques generate huge quantities of scientific data. Laboratory Information Management Systems (LIMS) are a necessary requirement, dealing with sample tracking, data storage and data reporting. Commercial LIMS solutions are available, but these can be both costly and overly complex for the task. The development of bespoke LIMS solutions offers a number of advantages, including the flexibility to fulfil all a laboratory's requirements at a fraction of the price of a commercial system. The programming language Perl is a perfect development solution for LIMS applications because of Perl's powerful but simple to use database and web interaction, it is also well known for enabling rapid application development and deployment, and boasts a very active and helpful developer community. The development of an in house LIMS from scratch however can take considerable time and resources, so programming tools that enable the rapid development of LIMS applications are essential but there are currently no LIMS development tools for Perl.  相似文献   

5.
6.
7.
The choice of technology and bioinformatics approach is critical in obtaining accurate and reliable information from next‐generation sequencing (NGS) experiments. An increasing number of software and methodological guidelines are being published, but deciding upon which approach and experimental design to use can depend on the particularities of the species and on the aims of the study. This leaves researchers unable to produce informed decisions on these central questions. To address these issues, we developed pipeliner – a tool to evaluate, by simulation, the performance of NGS pipelines in resequencing studies. Pipeliner provides a graphical interface allowing the users to write and test their own bioinformatics pipelines with publicly available or custom software. It computes a number of statistics summarizing the performance in SNP calling, including the recovery, sensitivity and false discovery rate for heterozygous and homozygous SNP genotypes. Pipeliner can be used to answer many practical questions, for example, for a limited amount of NGS effort, how many more reliable SNPs can be detected by doubling coverage and halving sample size or what is the false discovery rate provided by different SNP calling algorithms and options. Pipeliner thus allows researchers to carefully plan their study's sampling design and compare the suitability of alternative bioinformatics approaches for their specific study systems. Pipeliner is written in C++ and is freely available from http://github.com/brunonevado/Pipeliner .  相似文献   

8.
9.
Designing hardware for miniaturized robotics which mimics the capabilities of flying insects is of interest, because they share similar constraints (i.e. small size, low weight, and low energy consumption). Research in this area aims to enable robots with similarly efficient flight and cognitive abilities. Visual processing is important to flying insects' impressive flight capabilities, but currently, embodiment of insect-like visual systems is limited by the hardware systems available. Suitable hardware is either prohibitively expensive, difficult to reproduce, cannot accurately simulate insect vision characteristics, and/or is too heavy for small robotic platforms. These limitations hamper the development of platforms for embodiment which in turn hampers the progress on understanding of how biological systems fundamentally work. To address this gap, this paper proposes an inexpensive, lightweight robotic system for modelling insect vision. The system is mounted and tested on a robotic platform for mobile applications, and then the camera and insect vision models are evaluated. We analyse the potential of the system for use in embodiment of higher-level visual processes (i.e. motion detection) and also for development of navigation based on vision for robotics in general. Optic flow from sample camera data is calculated and compared to a perfect, simulated bee world showing an excellent resemblance.  相似文献   

10.
11.
The advantages of nucleotide-resolution models over atomic-resolutionand cylinder models of large RNA structure are discussed, anda toolkit of RNA substructures for use in building ‘pencil’models is described. Prefabricated elements from the toolkitcan be used to quickly assemble complex RNA structure modelsfor the visualization of known RNA structures, or for exploringpotential tertiary structure configurations based on secondarystructure and other information.  相似文献   

12.
Membrane proteins are involved in numerous vital biological processes, including transport, signal transduction and the enzymes in a variety of metabolic pathways. Integral membrane proteins account for up to 30% of the human proteome and they make up more than half of all currently marketed therapeutic targets. Unfortunately, membrane proteins are inherently recalcitrant to study using the normal toolkit available to scientists, and one is most often left with the challenge of finding inhibitors, activators and specific antibodies using a denatured or detergent solubilized aggregate. The Nanodisc platform circumvents these challenges by providing a self‐assembled system that renders typically insoluble, yet biologically and pharmacologically significant, targets such as receptors, transporters, enzymes, and viral antigens soluble in aqueous media in a native‐like bilayer environment that maintain a target''s functional activity. By providing a bilayer surface of defined composition and structure, Nanodiscs have found great utility in the study of cellular signaling complexes that assemble on a membrane surface. Nanodiscs provide a nanometer scale vehicle for the in vivo delivery of amphipathic drugs, therapeutic lipids, tethered nucleic acids, imaging agents and active protein complexes. This means for generating nanoscale lipid bilayers has spawned the successful use of numerous other polymer and peptide amphipathic systems. This review, in celebration of the Anfinsen Award, summarizes some recent results and provides an inroad into the current and historical literature.  相似文献   

13.
Mia T Levine  Harmit S Malik 《Fly》2013,7(3):137-141
Heterochromatin is the enigmatic eukaryotic genome compartment found mostly at telomeres and centromeres. Conventional approaches to sequence assembly and genetic manipulation fail in this highly repetitive, gene-sparse, and recombinationally silent DNA. In contrast, genetic and molecular analyses of euchromatin-encoded proteins that bind, remodel, and propagate heterochromatin have revealed its vital role in numerous cellular and evolutionary processes. Utilizing the 12 sequenced Drosophila genomes, Levine et al1 took a phylogenomic approach to discover new such protein “surrogates” of heterochromatin function and evolution. This paper reported over 20 new members of what was traditionally believed to be a small and static Heterochromatin Protein 1 (HP1) gene family. The newly identified HP1 proteins are structurally diverse, lineage-restricted, and expressed primarily in the male germline. The birth and death of HP1 genes follows a “revolving door” pattern, where new HP1s appear to replace old HP1s. Here, we address alternative evolutionary models that drive this constant innovation.  相似文献   

14.
Recently, a method to encode unnatural amino acids with diverse physicochemical and biological properties genetically in bacteria, yeast and mammalian cells was developed. Over 30 unnatural amino acids have been co-translationally incorporated into proteins with high fidelity and efficiency using a unique codon and corresponding transfer-RNA:aminoacyl-tRNA-synthetase pair. This provides a powerful tool for exploring protein structure and function in vitro and in vivo, and for generating proteins with new or enhanced properties.  相似文献   

15.
This paper describes a lightweight, high-performance communication protocol for the high-bandwidth, high-delay networks typical of computational Grids. One unique feature of this protocol is that it incorporates an extremely accurate classification mechanism that is efficient enough to diagnose the cause of data loss in real time, providing to the controller the opportunity to respond to different causes of data loss in different ways. The simplest adaptive response, and the one discussed in this paper, is to trigger aggressive congestion control measures only when the data loss is diagnosed as network related. However, even this very simple adaptation can have a tremendous impact on performance in a Grid setting where the resources allocated to a long-running, data-intensive application can fluctuate significantly during the course of its execution. In fact, we provide results showing that the utilization of the information provided by the classifier increased performance by over two orders of magnitude depending on the dominant cause of data loss. In this paper, we discuss the Bayesian statistical framework upon which the classifier is based and the classification metrics that make this approach highly successful. We discuss the integration of the classifier into the congestion control structures of an existing high-performance communication protocol, and provide empirical results showing that it correctly diagnosed the cause of data loss in over 98% of the experimental trials.  相似文献   

16.
Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore,there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm(OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimalstorage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNAalgorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored byusing this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by thisalgorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculationwith percentage) when compared with other known with sequential approach.  相似文献   

17.
18.
Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers.  相似文献   

19.
Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.  相似文献   

20.
High‐throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user‐friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号