期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Targeted isolation of cloned genomic regions by recombineering for haplotype phasing and isogenic targeting

Nedelkova M Maresca M Fu J Rostovskaya M Chenna R Thiede C Anastassiadis K Sarov M Stewart AF 《Nucleic acids research》2011,39(20):e137

Studying genetic variations in the human genome is important for understanding phenotypes and complex traits, including rare personal variations and their associations with disease. The interpretation of polymorphisms requires reliable methods to isolate natural genetic variations, including combinations of variations, in a format suitable for downstream analysis. Here, we describe a strategy for targeted isolation of large regions (～35?kb) from human genomes that is also applicable to any genome of interest. The method relies on recombineering to fish out target fosmid clones from pools and thereby circumvents the laborious need to plate and screen thousands of individual clones. To optimize the method, a new highly recombineering-efficient bacterial host, including inducible TrfA for fosmid copy number amplification, was developed. Various regions were isolated from human embryonic stem cell lines and a personal genome, including highly repetitive and duplicated ones. The maternal and paternal alleles at the MECP2/IRAK 1 loci were distinguished based on identification of novel allele-specific single-nucleotide polymorphisms in regulatory regions. Additionally, we applied further recombineering to construct isogenic targeting vectors for patient-specific applications. These methods will facilitate work to understand the linkage between personal variations and disease propensity, as well as possibilities for personal genome surgery. 相似文献

2.

Hap-seqX: Expedite algorithm for haplotype phasing with imputation using sequence data

Dan He Eleazar Eskin 《Gene》2013

Haplotype phasing is one of the most important problems in population genetics as haplotypes can be used to estimate the relatedness of individuals and to impute genotype information which is a commonly performed analysis when searching for variants involved in disease. The problem of haplotype phasing has been well studied. Methodologies for haplotype inference from sequencing data either combine a set of reference haplotypes and collected genotypes using a Hidden Markov Model or assemble haplotypes by overlapping sequencing reads. A recent algorithm Hap-seq considers using both sequencing data and reference haplotypes and it is a hybrid of a dynamic programming algorithm and a Hidden Markov Model (HMM), which is shown to be optimal. However, the algorithm requires extremely large amount of memory which is not practical for whole genome datasets. The current algorithm requires saving intermediate results to disk and reads these results back when needed, which significantly affects the practicality of the algorithm. In this work, we proposed the expedited version of the algorithm Hap-seqX, which addressed the memory issue by using a posterior probability to select the records that should be saved in memory. We show that Hap-seqX can save all the intermediate results in memory and improves the execution time of the algorithm dramatically. Utilizing the strategy, Hap-seqX is able to predict haplotypes from whole genome sequencing data. 相似文献

3.

Detecting transcriptionally active regions using genomic tiling arrays

Halasz G van Batenburg MF Perusse J Hua S Lu XJ White KP Bussemaker HJ 《Genome biology》2006,7(7):R59-10

相似文献

4.

Extending long-range phasing and haplotype library imputation methods to impute genotypes on sex chromosomes

John M Hickey Andreas Kranis 《遗传、选种与进化》2013,45(1):10

AlphaImpute is a flexible and accurate genotype imputation tool that was originally designed for the imputation of genotypes on autosomal chromosomes. In some species, sex chromosomes comprise a large portion of the genome. For example, chromosome Z represents approximately 8% of the chicken genome and therefore is likely to be important in determining genetic variation in a population. When breeding programs make selection decisions based on genomic information, chromosomes that are not represented on the genotyping platform will not be subject to selection. Therefore imputation algorithms should be able to impute genotypes for all chromosomes. The objective of this research was to extend AlphaImpute so that it could impute genotypes on sex chromosomes. The accuracy of imputation was assessed using different genotyping strategies in a real commercial chicken population. The correlation between true and imputed genotypes was high in all the scenarios and was 0.96 for the most favourable scenario. Overall, the accuracy of imputation of the sex chromosome was slightly lower than that of autosomes for all scenarios considered. 相似文献

5.

siRNA target site secondary structure predictions using local stable substructures 总被引：6，自引：3，他引：6

下载免费PDF全文

Heale BS Soifer HS Bowers C Rossi JJ 《Nucleic acids research》2005,33(3):e30

The crystal structure based model of the catalytic center of Ago2 revealed that the siRNA and the mRNA must be able to form an A-helix for correct positing of the scissile phosphate bond for cleavage in RNAi. This suggests that base pairing of the target mRNA with itself, i.e. secondary structure, must be removed before cleavage. Early on in the siRNA design, GC-rich target sites were avoided because of their potential to be involved in strong secondary structure. It is still unclear how important a factor mRNA secondary structure is in RNAi. However, it has been established that a difference in the thermostability of the ends of an siRNA duplex dictate which strand is loaded into the RNA-induced silencing complex. Here, we use a novel secondary structure prediction method and duplex-end differential calculations to investigate the importance of a secondary structure in the siRNA design. We found that the differential duplex-end stabilities alone account for functional prediction of 60% of the 80 siRNA sites examined, and that secondary structure predictions improve the prediction of site efficacy. A total of 80% of the non-functional sites can be eliminated using secondary structure predictions and duplex-end differential. 相似文献

6.

The number of long hairpins in intergenic trailer regions of actinobacteria is far greater than in other genomic regions

E. V. Lyubetskaya A. V. Seliverstov V. A. Lyubetsky 《Molecular Biology》2007,41(4):670-673

相似文献

7.

A note on local parameter orthogonality and Levinson--Durbin algorithm

TONG H. 《Biometrika》1988,75(4):788-789

相似文献

8.

A note on mate allocation for dominance handling in genomic selection

Miguel A Toro Luis Varona 《遗传、选种与进化》2010,42(1):33

Estimation of non-additive genetic effects in animal breeding is important because it increases the accuracy of breeding value prediction and the value of mate allocation procedures. With the advent of genomic selection these ideas should be revisited. The objective of this study was to quantify the efficiency of including dominance effects and practising mating allocation under a whole-genome evaluation scenario. Four strategies of selection, carried out during five generations, were compared by simulation techniques. In the first scenario (MS), individuals were selected based on their own phenotypic information. In the second (GSA), they were selected based on the prediction generated by the Bayes A method of whole-genome evaluation under an additive model. In the third (GSD), the model was expanded to include dominance effects. These three scenarios used random mating to construct future generations, whereas in the fourth one (GSD + MA), matings were optimized by simulated annealing. The advantage of GSD over GSA ranges from 9 to 14% of the expected response and, in addition, using mate allocation (GSD + MA) provides an additional response ranging from 6% to 22%. However, mate selection can improve the expected genetic response over random mating only in the first generation of selection. Furthermore, the efficiency of genomic selection is eroded after a few generations of selection, thus, a continued collection of phenotypic data and re-evaluation will be required. 相似文献

9.

Scanning for genomic regions subject to selective sweeps using SNP-MaP strategy

Deng L Tang X Chen W Lin J Lai Z Liu Z Zhang D 《基因组蛋白质组与生物信息学报(英文版)》2010,8(4):256-261

Population genomic approaches,which take advantages of high-throughput genotyping,are powerful yet costly methods to scan for selective sweeps.DNA-pooling strategies have been widely used for association studies because it is a cost-effective alternative to large-scale individual genotyping.Here,we performed an SNP-MaP(single nucleotide polymorphism microarrays and pooling)analysis using samples from Eurasia to evaluate the efficiency of pooling strategy in genome-wide scans for selection.By conducting simulations of allelotype data,we first demonstrated that the boxplot with average heterozygosity(HET)is a promising method to detect strong selective sweeps with a moderate level of pooling error.Based on this,we used a sliding window analysis of HET to detect the large contiguous regions(LCRs)putatively under selective sweeps from Eurasia datasets.This survey identified 63 LCRs in a European population.These signals were further supported by the integrated haplotype score(iHS)test using HapMap Ⅱ data.We also confirrned the European-specific signatures of positive selection from several previously identified genes(KEL,TRPV5,TRPV6,EPHB6).In summary,our results not only revealed the high credibility of SNP-MaP strategy in scanning for selective sweeps,but also provided an insight into the population differentiation. 相似文献

10.

A confidence-set approach for finding tightly linked genomic regions

下载免费PDF全文

Lin S Rogers JA Hsu JC 《American journal of human genetics》2001,68(5):1219-1228

As more studies adopt the approach of whole-genome screening, geneticists are faced with the challenge of having to interpret results from traditional approaches that were not designed for genome-scan data. Frequently, two-point analysis by the LOD method is performed to search for signals of linkage throughout the genome, for each of hundreds or even thousands of markers. This practice has raised the question of how to adjust the significance level for the fact that multiple tests are being performed. Various recommendations have been made, but no consensus has emerged. In this article, we propose a new method, the confidence-set approach, that circumvents the need to correct for the level of significance according to the number of markers tested. In the search for the gene location of a monogenic disorder, multiplicity adjustment is not needed in order to maintain the desired level of confidence. For complex diseases involving multiple genes, one needs only to adjust the level of significance according to the number of disease genes--a much smaller number than the number of markers in a genome screen-to ensure a predetermined genomewide confidence level. Furthermore, our formulation of the tests enables us to localize disease genes to small genomic regions, an extremely desirable feature that the traditional LOD method lacks. Our simulation study shows that, for sib-pair data, even when the coverage probability of the confidence set is chosen to be as high as 99%, our approach is able to implicate only the markers that are closely linked to the disease genes. 相似文献

11.

Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions 总被引：2，自引：0，他引：2

Huang YM Bystroff C 《Bioinformatics (Oxford, England)》2006,22(4):413-422

MOTIVATION: In recent years, advances have been made in the ability of computational methods to discriminate between homologous and non-homologous proteins in the 'twilight zone' of sequence similarity, where the percent sequence identity is a poor indicator of homology. To make these predictions more valuable to the protein modeler, they must be accompanied by accurate alignments. Pairwise sequence alignments are inferences of orthologous relationships between sequence positions. Evolutionary distance is traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. RESULTS: HMMSUM (HMMSTR-based substitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM matrices or structure-based substitution matrices SDM and HSDM when validated against remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method. 相似文献

12.

Identification and mapping of DNA binding proteins target sequences in long genomic regions by two-dimensional EMSA

Chernov IP Akopov SB Nikolaev LG Sverdlov ED 《BioTechniques》2006,41(1):91-96

相似文献

13.

Large-scale prediction of long disordered regions in proteins using random forests

Pengfei Han Xiuzhen Zhang Raymond S Norton Zhi-Ping Feng 《BMC bioinformatics》2009,10(1):8

Background

Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. 相似文献

14.

Assays on comparing the local concentration of HU protein in the different regions of Escherichia coli genomic DNA

Preobrazhenskaia OV Starodubova ES Karpov VL Rouviere-Yaniv J 《Molekuliarnaia biologiia》2005,39(4):678-686

相似文献

15.

ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment,even within amplified genomic regions

Rashid NU Giresi PG Ibrahim JG Sun W Lieb JD 《Genome biology》2011,12(7):R67

ZINBA (Zero-Inflated Negative Binomial Algorithm) identifies genomic regions enriched in a variety of ChIP-seq and related next-generation sequencing experiments (DNA-seq), calling both broad and narrow modes of enrichment across a range of signal-to-noise ratios. ZINBA models and accounts for factors that co-vary with background or experimental signal, such as G/C content, and identifies enrichment in genomes with complex local copy number variations. ZINBA provides a single unified framework for analyzing DNA-seq experiments in challenging genomic contexts. 相似文献

16.

A note on the relationship between parameter collinearity and local influence

SCHALL ROBERT; DUNNE TIMOTHY T. 《Biometrika》1992,79(2):399-404

相似文献

17.

The amplified long genomic sequence (ALGS) located in the centromeric regions of mouse chromosomes.

T Koide M Yoshino M Niwa M Ishiura T Shiroishi K Moriwaki 《Genomics》1992,13(4):1186-1191

We reported previously that the haploid genome of standard strains of laboratory mice contains approximately 70 copies of an amplified long genomic sequence, designated ALGS, that includes a retroposon of the gene for elongation factor 2 (MER). The length of each repeating unit is more than 60 kb, and the sequence of the unit is highly conserved among the repeats. In the present study, Southern blot analysis of the genomes of wild rodents demonstrated that the ALGS is present in all subspecies of Mus musculus and is abundant in M. spicilegus, whereas it is absent in M. spretus as well as in Rattus and other closely related genera. This result indicates that the amplification occurred after the species differentiation with the genus Mus and at least prior to the differentiation of subspecies of M. musculus. To locate chromosomal positions of the ALGS, in situ hybridization was carried out with laboratory strains and wild mice. It appears that the ALGS is located in the centromeric regions of most chromosomes in laboratory mice, M. musculus and M. spicilegus, whereas no positive signals were observed with M. spretus, in accordance with the results from the Southern blotting analysis. 相似文献

18.

A common reference population from four European Holstein populations increases reliability of genomic predictions

Mogens S Lund Adrianus PW de Roos Alfred G de Vries Tom Druet Vincent Ducrocq Sébastien Fritz Fran?ois Guillaume Bernt Guldbrandtsen Zenting Liu Reinhard Reents Chris Schrooten Franz Seefried Guosheng Su 《遗传、选种与进化》2011,43(1):43

Background

Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to the reference population, the reliability of their phenotypes, and the relatedness of the populations that are combined.

Methods

This paper assesses the increase in reliability achieved when combining four Holstein reference populations of 4000 bulls each, from European breeding organizations, i.e. UNCEIA (France), VikingGenetics (Denmark, Sweden, Finland), DHV-VIT (Germany) and CRV (The Netherlands, Flanders). Each partner validated its own bulls using their national reference data and the combined data, respectively.

Results

Combining the data significantly increased the reliability of genomic predictions for bulls in all four populations. Reliabilities increased by 10%, compared to reliabilities obtained with national reference populations alone, when they were averaged over countries and the traits evaluated. For different traits and countries, the increase in reliability ranged from 2% to 19%.

Conclusions

Genomic selection programs benefit greatly from combining data from several closely related populations into a single large reference population. 相似文献

19.

Accounting for trait architecture in genomic predictions of US Holstein cattle using a weighted realized relationship matrix

Francesco Tiezzi Christian Maltecca 《遗传、选种与进化》2015,47(1)

Background

Genomic BLUP (GBLUP) can predict breeding values for non-phenotyped individuals based on the identity-by-state genomic relationship matrix (G). The G matrix can be constructed from thousands of markers spread across the genome. The strongest assumption of G and consequently of GBLUP is that all markers contribute equally to the genetic variance of a trait. This assumption is violated for traits that are controlled by a small number of quantitative trait loci (QTL) or individual QTL with large effects. In this paper, we investigate the performance of using a weighted genomic relationship matrix (wG) that takes into consideration the genetic architecture of the trait in order to improve predictive ability for a wide range of traits. Multiple methods were used to calculate weights for several economically relevant traits in US Holstein dairy cattle. Predictive performance was tested by k-means cross-validation.

Results

Relaxing the GBLUP assumption of equal marker contribution by increasing the weight that is given to a specific marker in the construction of the trait-specific G resulted in increased predictive performance. The increase was strongest for traits that are controlled by a small number of QTL (e.g. fat and protein percentage). Furthermore, bias in prediction estimates was reduced compared to that resulting from the use of regular G. Even for traits with low heritability and lower general predictive performance (e.g. calving ease traits), weighted G still yielded a gain in accuracy.

Conclusions

Genomic relationship matrices weighted by marker realized variance yielded more accurate and less biased predictions for traits regulated by few QTL. Genome-wide association analyses were used to derive marker weights for creating weighted genomic relationship matrices. However, this can be cumbersome and prone to low stability over generations because of erosion of linkage disequilibrium between markers and QTL. Future studies may include other sources of information, such as functional annotation and gene networks, to better exploit the genetic architecture of traits and produce more stable predictions.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0100-1) contains supplementary material, which is available to authorized users. 相似文献

20.

Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population

Hongding Gao Ole F Christensen Per Madsen Ulrik S Nielsen Yuan Zhang Mogens S Lund Guosheng Su 《遗传、选种与进化》2012,44(1):8

Background

A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population.

Methods

The data consisted of de-regressed proofs (DRP) for 5 214 genotyped and 9 374 non-genotyped bulls. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Five approaches for genomic prediction were used: 1) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted for the difference of scale between the genomic and the pedigree relationship matrices. A set of weights on the pedigree relationship matrix (ranging from 0.05 to 0.40) was used to build the combined relationship matrix in the single-step blending method and the GBLUP method with a polygenetic effect.

Results

Averaged over the 16 traits, reliabilities of genomic breeding values predicted using the GBLUP method with a polygenic effect (relative weight of 0.20) were 0.3% higher than reliabilities from the simple GBLUP method (without a polygenic effect). The adjusted single-step blending and original single-step blending methods (relative weight of 0.20) had average reliabilities that were 2.1% and 1.8% higher than the simple GBLUP method, respectively. In addition, the GBLUP method with a polygenic effect led to less bias of genomic predictions than the simple GBLUP method, and both single-step blending methods yielded less bias of predictions than all GBLUP methods.

Conclusions

The single-step blending method is an appealing approach for practical genomic prediction in dairy cattle. Genomic prediction from the single-step blending method can be improved by adjusting the scale of the genomic relationship matrix. 相似文献