首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability.  相似文献   

2.
Genome-wide association studies (GWAS) have identified a large amount of single-nucleotide polymorphisms (SNPs) associated with complex traits. A recently developed linear mixed model for estimating heritability by simultaneously fitting all SNPs suggests that common variants can explain a substantial fraction of heritability, which hints at the low power of single variant analysis typically used in GWAS. Consequently, many multi-locus shrinkage models have been proposed under a Bayesian framework. However, most use Markov Chain Monte Carlo (MCMC) algorithm, which are time-consuming and challenging to apply to GWAS data. Here, we propose a fast algorithm of Bayesian adaptive lasso using variational inference (BAL-VI). Extensive simulations and real data analysis indicate that our model outperforms the well-known Bayesian lasso and Bayesian adaptive lasso models in accuracy and speed. BAL-VI can complete a simultaneous analysis of a lung cancer GWAS data with ~3400 subjects and ~570,000 SNPs in about half a day.  相似文献   

3.
MOTIVATION: Sequences for new proteins are being determined at a rapid rate, as a result of the Human Genome Project, and related genome research. The ability to predict the three-dimensional structure of proteins from sequence alone would be useful in discovering and understanding their function. Threading, or fold recognition, aims to predict the tertiary structure of a protein by aligning its amino acid sequence with a large number of structures, and finding the best fit. This approach depends on obtaining good performance from both the scoring function, which simulates the free energy for given trial alignments, and the threading algorithm, which searches for the lowest-score alignment. It appears that current scoring functions and threading algorithms need improvement. RESULTS: This paper presents a new threading algorithm. Numerical tests demonstrate that it is more powerful than two popular approximate algorithms, and much faster than exact methods.  相似文献   

4.
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.  相似文献   

5.
6.
Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.  相似文献   

7.
8.
9.
A greedy algorithm for aligning DNA sequences.   总被引:39,自引:0,他引:39  
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.  相似文献   

10.
Biomineralization, the biologically controlled formation of mineral deposits, is of widespread importance in biology, medicine, and engineering. Mineralized structures are found in most metazoan phyla and often have supportive, protective, or feeding functions. Among deuterostomes, only echinoderms and vertebrates produce extensive biomineralized structures. Although skeletons appeared independently in these two groups, ancestors of the vertebrates and echinoderms may have utilized similar components of a shared genetic "toolkit" to carry out biomineralization. The present study had two goals. First, we sought to expand our understanding of the proteins involved in biomineralization in the sea urchin, a powerful model system for analyzing the basic cellular and molecular mechanisms that underlie this process. Second, we sought to shed light on the possible evolutionary relationships between biomineralization in echinoderms and vertebrates. We used several computational methods to survey the genome of the purple sea urchin Strongylocentrotus purpuratus for gene products involved in biomineralization. Our analysis has greatly expanded the collection of biomineralization-related proteins. We have found that these proteins are often members of small families encoded by genes that are clustered in the genome. Most of the proteins are sea urchin-specific; that is, they have no apparent homologues in other invertebrate deuterostomes or vertebrates. Similarly, many of the vertebrate proteins that mediate mineral deposition do not have counterparts in the S. purpuratus genome. Our findings therefore reveal substantial differences in the primary sequences of proteins that mediate biomineral formation in echinoderms and vertebrates, possibly reflecting loose constraints on the primary structures of the proteins involved. On the other hand, certain cellular and molecular processes associated with earlier events in skeletogenesis appear similar in echinoderms and vertebrates, leaving open the possibility of deeper evolutionary relationships.  相似文献   

11.
Separation of complex valued signals is a frequently arising problem in signal processing. For example, separation of convolutively mixed source signals involves computations on complex valued signals. In this article, it is assumed that the original, complex valued source signals are mutually statistically independent, and the problem is solved by the independent component analysis (ICA) model. ICA is a statistical method for transforming an observed multidimensional random vector into components that are mutually as independent as possible. In this article, a fast fixed-point type algorithm that is capable of separating complex valued, linearly mixed source signals is presented and its computational efficiency is shown by simulations. Also, the local consistency of the estimator given by the algorithm is proved.  相似文献   

12.

Background

With the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis is expected to unveil a large portion of unexplained heritability of complex traits.

Methods

In this work, we propose IGENT, Information theory-based GEnome-wide gene-gene iNTeraction method. IGENT is an efficient algorithm for identifying genome-wide gene-gene interactions (GGI) and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. Our method uses information gain (IG) and evaluates its significance without resampling.

Results

Through our simulation studies, the power of the IGENT is shown to be better than or equivalent to that of that of BOOST. The proposed method successfully detected GGI for bipolar disorder in the Wellcome Trust Case Control Consortium (WTCCC) and age-related macular degeneration (AMD).

Conclusions

The proposed method is implemented by C++ and available on Windows, Linux and MacOSX.
  相似文献   

13.
We have isolated a cloned segment of Drosophila genomic DNA containing a ribosomal protein gene. Hybridization analysis of the DNA in this clone indicates a complex organization of repeated elements within this cloned segment. At least one of these repeated elements is homologous to regions of rDNA. Restriction analysis of the clone shows that some of the repeated elements are present as tandem duplications and in scattered locations within the cloned DNA segment. There are also three non-ribosomal protein genes contained in this clone, each of which is expressed along with the ribosomal protein gene into RNA species present in Drosophila embryos.  相似文献   

14.
A new string searching algorithm is presented aimed at searchingfor the occurrence of character patterns in longer charactertexts. The algorithm, specifically designed for nucleic acidsequence data, is essentially derived from the Boyer –Moore method (Comm. ACM, 20, 762 – 772, 1977). Both patternand text data are compressed so that the natural 4-letter alphabetof nucleic acid sequences is considerably enlarged. The stringsearch starts from the last character of the pattern and proceedsin large jumps through the text to be searched. The data compressionand searching algorithm allows one to avoid searching for patternsnot present in the text as well as to inspect, for each pattern,all text characters until the exact match with the text is found.These considerations are supported by empirical evidence andcomparisons with other methods.  相似文献   

15.
The recent completion of the Drosophila genome sequence opens new avenues for neurobiology research. We screened the fly genome sequence for homologs of mammalian genes implicated directly or indirectly in exocytosis and endocytosis of synaptic vesicles. We identified fly homologs for 93% of the vertebrate genes that were screened. These are on average 60% identical and 74% similar to their vertebrate counterparts. This high degree of conservation suggests that little protein diversification has been tolerated in the evolution of synaptic transmission. Finally, and perhaps most exciting for Drosophila neurobiologists, the genomic sequence allows us to identify P element transposon insertions in or near genes, thereby allowing rapid isolation of mutations in genes of interest. Analysis of the phenotypes of these mutants should accelerate our understanding of the role of numerous proteins implicated in synaptic transmission.  相似文献   

16.
A fast algorithm for computing recombination is developed for model organisms with selection on haploids. Haplotype frequencies are transformed to marginal frequencies; random mating and recombination are computed; marginal frequencies are transformed back to haplotype frequencies. With L diallelic loci, this algorithm is theoretically a factor of a constant times (3/8)L faster than standard computations with selection on diploids, and up to 16 recombining loci have been computed. This algorithm is then applied to model the opposing evolutionary forces of multilocus epistatic selection and recombination. Selection is assumed to favor haplotypes with specific alleles either all present or all absent. When the number of linked loci exceeds a critical value, a jump bifurcation occurs in the two-dimensional parameter space of the selection coefficient s and the recombination frequency r. The equilibrium solution jumps from high to low mean fitness with increasing r or decreasing s. These numerical results display an unexpected and dramatic nonlinear effect occurring in linkage models with a large number of loci.  相似文献   

17.
A fast, sensitive pattern-matching approach for protein sequences   总被引:5,自引:0,他引:5  
Pattern-matching algorithms are a powerful tool for findingsimilarities and relationships among the steadily growing amountof known protein sequences. We present a fast, sensitive pattern-matchingalgorithm that describes a pattern by its physico-chemical propertiesrather than by occurrence ofamino acids, using a fast, dynamicprogramming algorithm. Selected examples will demonstrate applicationsand advantages of our approach.  相似文献   

18.
Removal of repeated sequences from hybridisation probes.   总被引:45,自引:22,他引:45       下载免费PDF全文
Pre-reassociation of human clone probes, containing dispersed highly repeated sequences, (e.g. Alu and KpnI families), with a large excess of sonicated total human DNA allows signal from single and low copy number components to be detected in transfer hybridisations. The signal from non-dispersed repeated sequences is reduced to single copy levels. The procedure, which is simple and quick, is illustrated using model combinations of well characterised cloned probes, and is applied to a sample of randomly chosen cosmid clones. A theoretical assessment is presented which may be useful to those wishing to use this procedure.  相似文献   

19.
Transcriptional measurements of mouse repeated DNA sequences.   总被引:4,自引:0,他引:4       下载免费PDF全文
  相似文献   

20.
A palindrome is a set of characters that reads the same forwards and backwards. Since the discovery of palindromic peptide sequences two decades ago, little effort has been made to understand its structural, functional and evolutionary significance. Therefore, in view of this, an algorithm has been developed to identify all perfect palindromes (excluding the palindromic subset and tandem repeats) in a single protein sequence. The proposed algorithm does not impose any restriction on the number of residues to be given in the input sequence. This avant-garde algorithm will aid in the identification of palindromic peptide sequences of varying lengths in a single protein sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号