首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 19 毫秒
1.
Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.
This is a PLOS Computational Biology Software Article
  相似文献   

2.
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.  相似文献   

3.
4.
In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions.  相似文献   

5.
A user-friendly graphical data analysis to perform stability analysis of genotype x environmental interactions, using Tai's stability model and additive main effects and multiplicative interaction (AMMI) biplots, are presented here. This practical approach integrates statistical and graphical analysis tools available in SAS systems and provides user-friendly applications to perform complete stability analyses without writing SAS program statements or using pull-down menu interfaces by running the SAS macros in the background. By using this macro approach, the agronomists and plant breeders can effectively perform stability analysis and spend more time in data exploration, interpretation of graphs, and output, rather than debugging their program errors. The necessary MACRO-CALL files can be downloaded from the author's home page at http://www.ag.unr.edu/gf. The nature and the distinctive features of the graphics produced by these applications are illustrated by using published data.  相似文献   

6.
The Graphics Command Interpreter (GCI) is an independent server module that can be interfaced to any program that needs interactive three-dimensional (3D) graphics capabilities. The principal advantage of GCI is its simplicity. Only a limited set of powerful features have been implemented, including object management, global and local transformations, rotation, translation, clipping, scaling, viewport operations, window management, menu handling and picking.GCI and the master (client) program it serves run concurrently, communicating over a local or remote TCP/IP network. GCI sets up socket communication and provides a 3D graphics window and a terminal emulator for the master program. Communication between the two programs is via ASCII strings over standard I/O channels. The implied language for messages is very simple. GCI interprets messages from the master program and implements them as changes of graphical objects or as text messages to the user. GCI provides the user with facilities to manipulate the view of the displayed 3D objects interactively, independently of the master program, and to communicate mouse-controlled selection of menu items or 3D points as well as keyboard strings to the master program.The program is written in C and initially implemented using the Silicon Graphics GL graphics library. As the need to link special libraries to the master program is completely avoided, GCI can very easily be interfaced to existing programs written in any language and running on any operating system capable of TCP/IP communication. The program is freely available.  相似文献   

7.
Positron emission tomography (PET) is an important imaging modality in both clinical usage and research studies. We have developed a compact high-sensitivity PET system that consisted of two large-area panel PET detector heads, which produce more than 224 million lines of response and thus request dramatic computational demands. In this work, we employed a state-of-the-art graphics processing unit (GPU), NVIDIA Tesla C2070, to yield an efficient reconstruction process. Our approaches ingeniously integrate the distinguished features of the symmetry properties of the imaging system and GPU architectures, including block/warp/thread assignments and effective memory usage, to accelerate the computations for ordered subset expectation maximization (OSEM) image reconstruction. The OSEM reconstruction algorithms were implemented employing both CPU-based and GPU-based codes, and their computational performance was quantitatively analyzed and compared. The results showed that the GPU-accelerated scheme can drastically reduce the reconstruction time and thus can largely expand the applicability of the dual-head PET system.  相似文献   

8.
In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology) and the association of individual genes or proteins with these concepts (e.g., GO terms), our method will assign a Hierarchical Modularity Score (HMS) to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our method using Saccharomyces cerevisiae data from KEGG and MIPS databases and several other computationally derived and curated datasets. The code and additional supplemental files can be obtained from http://code.google.com/p/functional-annotation-of-hierarchical-modularity/ (Accessed 2012 March 13).  相似文献   

9.
Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset. AVAILABILITY: Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.  相似文献   

10.
PKB: a program system and data base for analysis of protein structure   总被引:2,自引:0,他引:2  
S H Bryant 《Proteins》1989,5(3):233-247
PKB is a computer program system that combines a data base of three-dimensional protein structures with a series of algorithms for pattern recognition, data analysis, and graphics. By typing relatively simple commands the user may search the data base for instances of a structural motif and analyze in detail the set of individual structures that are found. The application of PKB to the study of protein folding is illustrated in three examples. The first analysis compares the conformations observed for a short sequential motif, sequences similar to the cell-attachment signal Arg-Gly-Asp. The second compares sequences observed for a conformational motif, a 16-residue beta alpha beta unit. The third analysis considers a population of substructures containing ion-pair interactions, examining the relationship of frequency of occurrence to calculated electrostatic energy.  相似文献   

11.
Mechanical signals of both low and high intensity are inhibitory to fat and anabolic to bone in vivo, and have been shown to directly affect mesenchymal stem cell pools from which fat and bone precursors emerge. To identify an idealized mechanical regimen which can regulate MSC fate, low intensity vibration (LIV; <10 microstrain, 90 Hz) and high magnitude strain (HMS; 20,000 microstrain, 0.17 Hz) were examined in MSC undergoing adipogenesis. Two x twenty minute bouts of either LIV or HMS suppressed adipogenesis when there was at least a 1h refractory period between bouts; this effect was enhanced when the rest period was extended to 3h. Mechanical efficacy to inhibit adipogenesis increased with additional loading bouts if a refractory period was incorporated. Mechanical suppression of adipogenesis with LIV involved inhibition of GSK3β with subsequent activation of β-catenin as has been shown for HMS. These data indicate that mechanical biasing of MSC lineage selection is more dependent on event scheduling than on load magnitude or duration. As such, a full day of rest should not be required to "reset" the mechanical responsiveness of MSCs, and suggests that incorporating several brief mechanical challenges within a 24h period may improve salutary endpoints in vivo. That two diverse mechanical inputs are enhanced by repetition after a refractory period suggests that rapid cellular adaptation can be targeted.  相似文献   

12.
Shah DD  Conrad JA  Heinz B  Brownlee JM  Moran GR 《Biochemistry》2011,50(35):7694-7704
4-Hydroxyphenylpyruvate dioxygenase (HPPD) and hydroxymandelate synthase (HMS) each catalyze similar complex dioxygenation reactions using the substrates 4-hydroxyphenylpyruvate (HPP) and dioxygen. The reactions differ in that HPPD hydroxylates at the ring C1 and HMS at the benzylic position. The HPPD reaction is more complex in that hydroxylation at C1 instigates a 1,2-shift of an aceto substituent. Despite that multiple intermediates have been observed to accumulate in single turnover reactions of both enzymes, neither enzyme exhibits significant accumulation of the hydroxylating intermediate. In this study we employ a product analysis method based on the extents of intermediate partitioning with HPP deuterium substitutions to measure the kinetic isotope effects for hydroxylation. These data suggest that, when forming the native product homogentisate, the wild-type form of HPPD produces a ring epoxide as the immediate product of hydroxylation but that the variant HPPDs tended to also show the intermediacy of a benzylic cation for this step. Similarly, the kinetic isotope effects for the other major product observed, quinolacetic acid, showed that either pathway is possible. HMS variants show small normal kinetic isotope effects that indicate displacement of the deuteron in the hydroxylation step. The relatively small magnitude of this value argues best for a hydrogen atom abstraction/rebound mechanism. These data are the first definitive evidence for the nature of the hydroxylation reactions of HPPD and HMS.  相似文献   

13.
Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.  相似文献   

14.
Hyperreactive malarious splenomegaly (HMS) reflects abnormal immune responses to malarial infection. The central question is whether HMS results from unusual patterns of malarial infection or from immune incompetence in the host. Family distributions of two features of the syndrome, splenomegaly and excessively high IgM levels, have been examined in a Papau New Guinea population in which HMS is exceptionally common. Segregation analysis of spleen grade shows that a major sex-linked gene controls hyperresponsiveness to malaria. This finding is supported by additional segregation analysis, which shows that an autosomal locus cannot account for a significant proportion of variation in spleen grade, and by path analysis, which rejects a model that assumes that parents contribute equally to the child's genotype. The sex-linked gene contributing to HMS was not mediated through sex linkage of a major gene for IgM concentrations, as shown by segregation analysis. It has yet to be determined whether this pattern of inheritance also applies to HMS occurring sporadically in other less severely affected populations. The applicability of these findings to the general variability in "normal" IgM responses to malaria also remains to be established.  相似文献   

15.

Background

The genetic basis of postzygotic isolation is a central puzzle in evolutionary biology. Evolutionary forces causing hybrid sterility or inviability act on the responsible genes while they still are polymorphic, thus we have to study these traits as they arise, before isolation is complete.

Methodology/Principal Findings

Isofemale strains of D. mojavensis vary significantly in their production of sterile F1 sons when females are crossed to D. arizonae males. We took advantage of the intraspecific polymorphism, in a novel design, to perform quantitative trait locus (QTL) mapping analyses directly on F1 hybrid male sterility itself. We found that the genetic architecture of the polymorphism for hybrid male sterility (HMS) in the F1 is complex, involving multiple QTL, epistasis, and cytoplasmic effects.

Conclusions/Significance

The role of extensive intraspecific polymorphism, multiple QTL, and epistatic interactions in HMS in this young species pair shows that HMS is arising as a complex trait in this system. Directional selection alone would be unlikely to maintain polymorphism at multiple loci, thus we hypothesize that directional selection is unlikely to be the only evolutionary force influencing postzygotic isolation.  相似文献   

16.
Pevzner and Sze(19) have introduced the Planted (l,d)-Motif Problem to find similar patterns (motifs) in sequences which represent the promoter regions of co-regulated genes, where l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have been developed to solve this motif problem. However, these algorithms either have long running times or do not guarantee the motif can be found. In this paper, we introduce new algorithms to solve this motif problem. Our algorithms can find motifs in reasonable time for not only the challenging (9, 2), (11, 3), (15, 5)-motif problems but for even longer motifs, say (20, 7), (30, 11) and (40, 15), which have never been seriously attempted by other researchers because of the large time and space required. Besides, our algorithms can be extended to find more complicated motifs structure called cis-regulatory modules (CRM).  相似文献   

17.
18.
In order to visualize the stereochemical aspects of the Diels-Alder model cycloaddition of ethylene on butadiene, we developed a computer graphics animated model. The structural data base was deduced from MINDO/3 calculations, and the application program makes it possible to display in detail the different steps of the reaction mechanism. The scope of the application has been enlarged by a similar representation of the Diels-Alder cycloaddition of bis (methylene)—2,3 bicyclo [2.2.1]heptane to ethylene. Both of these examples suggest that molecular graphics is an ideal tool for visualizing and understanding the stereochemistry of complex chemical reactions.  相似文献   

19.
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.  相似文献   

20.
Tao Y  Zeng ZB  Li J  Hartl DL  Laurie CC 《Genetics》2003,164(4):1399-1418
Hybrid male sterility (HMS) is a rapidly evolving mechanism of reproductive isolation in Drosophila. Here we report a genetic analysis of HMS in third-chromosome segments of Drosophila mauritiana that were introgressed into a D. simulans background. Qualitative genetic mapping was used to localize 10 loci on 3R and a quantitative trait locus (QTL) procedure (multiple-interval mapping) was used to identify 19 loci on the entire chromosome. These genetic incompatibilities often show dominance and complex patterns of epistasis. Most of the HMS loci have relatively small effects and generally at least two or three of them are required to produce complete sterility. Only one small region of the third chromosome of D. mauritiana by itself causes a high level of infertility when introgressed into D. simulans. By comparison with previous studies of the X chromosome, we infer that HMS loci are only approximately 40% as dense on this autosome as they are on the X chromosome. These results are consistent with the gradual evolution of hybrid incompatibilities as a by-product of genetic divergence in allopatric populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号