共查询到20条相似文献,搜索用时 15 毫秒
1.
Nedelkova M Maresca M Fu J Rostovskaya M Chenna R Thiede C Anastassiadis K Sarov M Stewart AF 《Nucleic acids research》2011,39(20):e137
Studying genetic variations in the human genome is important for understanding phenotypes and complex traits, including rare personal variations and their associations with disease. The interpretation of polymorphisms requires reliable methods to isolate natural genetic variations, including combinations of variations, in a format suitable for downstream analysis. Here, we describe a strategy for targeted isolation of large regions (~35?kb) from human genomes that is also applicable to any genome of interest. The method relies on recombineering to fish out target fosmid clones from pools and thereby circumvents the laborious need to plate and screen thousands of individual clones. To optimize the method, a new highly recombineering-efficient bacterial host, including inducible TrfA for fosmid copy number amplification, was developed. Various regions were isolated from human embryonic stem cell lines and a personal genome, including highly repetitive and duplicated ones. The maternal and paternal alleles at the MECP2/IRAK 1 loci were distinguished based on identification of novel allele-specific single-nucleotide polymorphisms in regulatory regions. Additionally, we applied further recombineering to construct isogenic targeting vectors for patient-specific applications. These methods will facilitate work to understand the linkage between personal variations and disease propensity, as well as possibilities for personal genome surgery. 相似文献
2.
Haplotype phasing is one of the most important problems in population genetics as haplotypes can be used to estimate the relatedness of individuals and to impute genotype information which is a commonly performed analysis when searching for variants involved in disease. The problem of haplotype phasing has been well studied. Methodologies for haplotype inference from sequencing data either combine a set of reference haplotypes and collected genotypes using a Hidden Markov Model or assemble haplotypes by overlapping sequencing reads. A recent algorithm Hap-seq considers using both sequencing data and reference haplotypes and it is a hybrid of a dynamic programming algorithm and a Hidden Markov Model (HMM), which is shown to be optimal. However, the algorithm requires extremely large amount of memory which is not practical for whole genome datasets. The current algorithm requires saving intermediate results to disk and reads these results back when needed, which significantly affects the practicality of the algorithm. In this work, we proposed the expedited version of the algorithm Hap-seqX, which addressed the memory issue by using a posterior probability to select the records that should be saved in memory. We show that Hap-seqX can save all the intermediate results in memory and improves the execution time of the algorithm dramatically. Utilizing the strategy, Hap-seqX is able to predict haplotypes from whole genome sequencing data. 相似文献
3.
4.
AlphaImpute is a flexible and accurate genotype imputation tool that was originally designed for the imputation of genotypes on autosomal chromosomes. In some species, sex chromosomes comprise a large portion of the genome. For example, chromosome Z represents approximately 8% of the chicken genome and therefore is likely to be important in determining genetic variation in a population. When breeding programs make selection decisions based on genomic information, chromosomes that are not represented on the genotyping platform will not be subject to selection. Therefore imputation algorithms should be able to impute genotypes for all chromosomes. The objective of this research was to extend AlphaImpute so that it could impute genotypes on sex chromosomes. The accuracy of imputation was assessed using different genotyping strategies in a real commercial chicken population. The correlation between true and imputed genotypes was high in all the scenarios and was 0.96 for the most favourable scenario. Overall, the accuracy of imputation of the sex chromosome was slightly lower than that of autosomes for all scenarios considered. 相似文献
5.
siRNA target site secondary structure predictions using local stable substructures 总被引:6,自引:3,他引:6
下载免费PDF全文

The crystal structure based model of the catalytic center of Ago2 revealed that the siRNA and the mRNA must be able to form an A-helix for correct positing of the scissile phosphate bond for cleavage in RNAi. This suggests that base pairing of the target mRNA with itself, i.e. secondary structure, must be removed before cleavage. Early on in the siRNA design, GC-rich target sites were avoided because of their potential to be involved in strong secondary structure. It is still unclear how important a factor mRNA secondary structure is in RNAi. However, it has been established that a difference in the thermostability of the ends of an siRNA duplex dictate which strand is loaded into the RNA-induced silencing complex. Here, we use a novel secondary structure prediction method and duplex-end differential calculations to investigate the importance of a secondary structure in the siRNA design. We found that the differential duplex-end stabilities alone account for functional prediction of 60% of the 80 siRNA sites examined, and that secondary structure predictions improve the prediction of site efficacy. A total of 80% of the non-functional sites can be eliminated using secondary structure predictions and duplex-end differential. 相似文献
6.
7.
8.
Estimation of non-additive genetic effects in animal breeding is important because it increases the accuracy of breeding value prediction and the value of mate allocation procedures. With the advent of genomic selection these ideas should be revisited. The objective of this study was to quantify the efficiency of including dominance effects and practising mating allocation under a whole-genome evaluation scenario. Four strategies of selection, carried out during five generations, were compared by simulation techniques. In the first scenario (MS), individuals were selected based on their own phenotypic information. In the second (GSA), they were selected based on the prediction generated by the Bayes A method of whole-genome evaluation under an additive model. In the third (GSD), the model was expanded to include dominance effects. These three scenarios used random mating to construct future generations, whereas in the fourth one (GSD + MA), matings were optimized by simulated annealing. The advantage of GSD over GSA ranges from 9 to 14% of the expected response and, in addition, using mate allocation (GSD + MA) provides an additional response ranging from 6% to 22%. However, mate selection can improve the expected genetic response over random mating only in the first generation of selection. Furthermore, the efficiency of genomic selection is eroded after a few generations of selection, thus, a continued collection of phenotypic data and re-evaluation will be required. 相似文献
9.
Population genomic approaches,which take advantages of high-throughput genotyping,are powerful yet costly methods to scan for selective sweeps.DNA-pooling strategies have been widely used for association studies because it is a cost-effective alternative to large-scale individual genotyping.Here,we performed an SNP-MaP(single nucleotide polymorphism microarrays and pooling)analysis using samples from Eurasia to evaluate the efficiency of pooling strategy in genome-wide scans for selection.By conducting simulations of allelotype data,we first demonstrated that the boxplot with average heterozygosity(HET)is a promising method to detect strong selective sweeps with a moderate level of pooling error.Based on this,we used a sliding window analysis of HET to detect the large contiguous regions(LCRs)putatively under selective sweeps from Eurasia datasets.This survey identified 63 LCRs in a European population.These signals were further supported by the integrated haplotype score(iHS)test using HapMap Ⅱ data.We also confirrned the European-specific signatures of positive selection from several previously identified genes(KEL,TRPV5,TRPV6,EPHB6).In summary,our results not only revealed the high credibility of SNP-MaP strategy in scanning for selective sweeps,but also provided an insight into the population differentiation. 相似文献
10.
As more studies adopt the approach of whole-genome screening, geneticists are faced with the challenge of having to interpret results from traditional approaches that were not designed for genome-scan data. Frequently, two-point analysis by the LOD method is performed to search for signals of linkage throughout the genome, for each of hundreds or even thousands of markers. This practice has raised the question of how to adjust the significance level for the fact that multiple tests are being performed. Various recommendations have been made, but no consensus has emerged. In this article, we propose a new method, the confidence-set approach, that circumvents the need to correct for the level of significance according to the number of markers tested. In the search for the gene location of a monogenic disorder, multiplicity adjustment is not needed in order to maintain the desired level of confidence. For complex diseases involving multiple genes, one needs only to adjust the level of significance according to the number of disease genes--a much smaller number than the number of markers in a genome screen-to ensure a predetermined genomewide confidence level. Furthermore, our formulation of the tests enables us to localize disease genes to small genomic regions, an extremely desirable feature that the traditional LOD method lacks. Our simulation study shows that, for sib-pair data, even when the coverage probability of the confidence set is chosen to be as high as 99%, our approach is able to implicate only the markers that are closely linked to the disease genes. 相似文献
11.
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions 总被引:2,自引:0,他引:2
MOTIVATION: In recent years, advances have been made in the ability of computational methods to discriminate between homologous and non-homologous proteins in the 'twilight zone' of sequence similarity, where the percent sequence identity is a poor indicator of homology. To make these predictions more valuable to the protein modeler, they must be accompanied by accurate alignments. Pairwise sequence alignments are inferences of orthologous relationships between sequence positions. Evolutionary distance is traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. RESULTS: HMMSUM (HMMSTR-based substitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM matrices or structure-based substitution matrices SDM and HSDM when validated against remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method. 相似文献
12.
13.
Background
Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. 相似文献14.
15.
ZINBA (Zero-Inflated Negative Binomial Algorithm) identifies genomic regions enriched in a variety of ChIP-seq and related
next-generation sequencing experiments (DNA-seq), calling both broad and narrow modes of enrichment across a range of signal-to-noise
ratios. ZINBA models and accounts for factors that co-vary with background or experimental signal, such as G/C content, and
identifies enrichment in genomes with complex local copy number variations. ZINBA provides a single unified framework for
analyzing DNA-seq experiments in challenging genomic contexts. 相似文献
16.
17.
We reported previously that the haploid genome of standard strains of laboratory mice contains approximately 70 copies of an amplified long genomic sequence, designated ALGS, that includes a retroposon of the gene for elongation factor 2 (MER). The length of each repeating unit is more than 60 kb, and the sequence of the unit is highly conserved among the repeats. In the present study, Southern blot analysis of the genomes of wild rodents demonstrated that the ALGS is present in all subspecies of Mus musculus and is abundant in M. spicilegus, whereas it is absent in M. spretus as well as in Rattus and other closely related genera. This result indicates that the amplification occurred after the species differentiation with the genus Mus and at least prior to the differentiation of subspecies of M. musculus. To locate chromosomal positions of the ALGS, in situ hybridization was carried out with laboratory strains and wild mice. It appears that the ALGS is located in the centromeric regions of most chromosomes in laboratory mice, M. musculus and M. spicilegus, whereas no positive signals were observed with M. spretus, in accordance with the results from the Southern blotting analysis. 相似文献
18.
Mogens S Lund Adrianus PW de Roos Alfred G de Vries Tom Druet Vincent Ducrocq Sébastien Fritz Fran?ois Guillaume Bernt Guldbrandtsen Zenting Liu Reinhard Reents Chris Schrooten Franz Seefried Guosheng Su 《遗传、选种与进化》2011,43(1):43
Background
Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined.Methods
This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively.Results
Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%.Conclusions
Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population. 相似文献19.
Background
Genomic BLUP (GBLUP) can predict breeding values for non-phenotyped individuals based on the identity-by-state genomic relationship matrix (G). The G matrix can be constructed from thousands of markers spread across the genome. The strongest assumption of G and consequently of GBLUP is that all markers contribute equally to the genetic variance of a trait. This assumption is violated for traits that are controlled by a small number of quantitative trait loci (QTL) or individual QTL with large effects. In this paper, we investigate the performance of using a weighted genomic relationship matrix (wG) that takes into consideration the genetic architecture of the trait in order to improve predictive ability for a wide range of traits. Multiple methods were used to calculate weights for several economically relevant traits in US Holstein dairy cattle. Predictive performance was tested by k-means cross-validation.Results
Relaxing the GBLUP assumption of equal marker contribution by increasing the weight that is given to a specific marker in the construction of the trait-specific G resulted in increased predictive performance. The increase was strongest for traits that are controlled by a small number of QTL (e.g. fat and protein percentage). Furthermore, bias in prediction estimates was reduced compared to that resulting from the use of regular G. Even for traits with low heritability and lower general predictive performance (e.g. calving ease traits), weighted G still yielded a gain in accuracy.Conclusions
Genomic relationship matrices weighted by marker realized variance yielded more accurate and less biased predictions for traits regulated by few QTL. Genome-wide association analyses were used to derive marker weights for creating weighted genomic relationship matrices. However, this can be cumbersome and prone to low stability over generations because of erosion of linkage disequilibrium between markers and QTL. Future studies may include other sources of information, such as functional annotation and gene networks, to better exploit the genetic architecture of traits and produce more stable predictions.Electronic supplementary material
The online version of this article (doi:10.1186/s12711-015-0100-1) contains supplementary material, which is available to authorized users. 相似文献20.
Hongding Gao Ole F Christensen Per Madsen Ulrik S Nielsen Yuan Zhang Mogens S Lund Guosheng Su 《遗传、选种与进化》2012,44(1):8