首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We report a genome-wide, multiscale approach to simultaneously measure the effect that the increased copy of each gene and/or operon has on a desired trait or phenotype. The method involves (i) growth selections on a mixture of several different plasmid-based genomic libraries of defined insert sizes or SCALEs, (ii) microarray studies of enriched plasmid DNA, and a (iii) mathematical multiscale analysis that precisely identifies the relevant genetic elements. This approach allows for identification of all single open reading frames and larger multigene fragments within a genomic library that alter the expression of a given phenotype. We have demonstrated this method in Escherichia coli by monitoring, in parallel, a population of >10(6) genomic library clones of different insert sizes, throughout continuous selections over a period of 100 generations.  相似文献   

2.
The traditional quantitative genetics model was used as the unifying approach to derive six existing and new definitions of genomic additive and dominance relationships. The theoretical differences of these definitions were in the assumptions of equal SNP effects (equivalent to across-SNP standardization), equal SNP variances (equivalent to within-SNP standardization), and expected or sample SNP additive and dominance variances. The six definitions of genomic additive and dominance relationships on average were consistent with the pedigree relationships, but had individual genomic specificity and large variations not observed from pedigree relationships. These large variations may allow finding least related genomes even within the same family for minimizing genomic relatedness among breeding individuals. The six definitions of genomic relationships generally had similar numerical results in genomic best linear unbiased predictions of additive effects (GBLUP) and similar genomic REML (GREML) estimates of additive heritability. Predicted SNP dominance effects and GREML estimates of dominance heritability were similar within definitions assuming equal SNP effects or within definitions assuming equal SNP variance, but had differences between these two groups of definitions. We proposed a new measure of genomic inbreeding coefficient based on parental genomic co-ancestry coefficient and genomic additive correlation as a genomic approach for predicting offspring inbreeding level. This genomic inbreeding coefficient had the highest correlation with pedigree inbreeding coefficient among the four methods evaluated for calculating genomic inbreeding coefficient in a Holstein sample and a swine sample.  相似文献   

3.
Estimating marker effects based on routinely generated phenotypic data of breeding programs is a cost-effective strategy to implement genomic selection. Truncation selection in breeding populations, however, could have a strong impact on the accuracy to predict genomic breeding values. The main objective of our study was to investigate the influence of phenotypic selection on the accuracy and bias of genomic selection. We used experimental data of 788 testcross progenies from an elite maize breeding program. The testcross progenies were evaluated in unreplicated field trials in ten environments and fingerprinted with 857 SNP markers. Random regression best linear unbiased prediction method was used in combination with fivefold cross-validation based on genotypic sampling. We observed a substantial loss in the accuracy to predict genomic breeding values in unidirectional selected populations. In contrast, estimating marker effects based on bidirectional selected populations led to only a marginal decrease in the prediction accuracy of genomic breeding values. We concluded that bidirectional selection is a valuable approach to efficiently implement genomic selection in applied plant breeding programs.  相似文献   

4.
Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.  相似文献   

5.
B. Dalby  A. J. Pereira    LSB. Goldstein 《Genetics》1995,139(2):757-766
We developed a screening approach that utilizes an inverse polymerase chain reaction (PCR) to detect P element insertions in or near previously cloned genes in Drosophila melanogaster. We used this approach in a large scale genetic screen in which P elements were mobilized from sites on the X chromosome to new autosomal locations. Mutagenized flies were combined in pools, and our screening approach was used to generate probes corresponding to the sequences flanking each site of insertion. These probes then were used for hybridization to cloned genomic intervals, allowing individuals carrying insertions in them to be detected. We used the same approach to perform repeated rounds of sib-selection to generate stable insertion lines. We screened 16,100 insert bearing individuals and recovered 11 insertions in five intervals containing genes encoding members of the kinesin superfamily in Drosophila melanogaster. In addition, we recovered an insertion in the region including the Larval Serum Protein-2 gene. Examination by Southern hybridization confirms that the lines we recovered represent genuine insertions in the corresponding genomic intervals. Our data indicates that this approach will be very efficient both for P element mutagenesis of new genomic regions and for detection and recovery of ``local' P element transposition events. In addition, our data constitutes a survey of preferred P element insertion sites in the Drosophila genome and suggests that insertion sites that are mutable at a rate of ~10(-4) are distributed every 40-50 kb.  相似文献   

6.
For the last twenty years fragment assembly was dominated by the "overlap - layout - consensus" algorithms that are used in all currently available assembly tools. However, the limits of these algorithms are being tested in the era of genomic sequencing and it is not clear whether they are the best choice for large-scale assemblies. Although the "overlap - layout - consensus" approach proved to be useful in assembling clones, it faces difficulties in genomic assemblies: the existing algorithms make assembly errors even in bacterial genomes. We abandoned the "overlap - layout - consensus" approach in favour of a new Eulerian Superpath approach that outperforms the existing algorithms for genomic fragment assembly (Pevzner et al. 2001 InProceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB-01), 256-26). In this paper we describe our new EULER-DB algorithm that, similarly to the Celera assembler takes advantage of clone-end sequencing by using the double-barreled data. However, in contrast to the Celera assembler, EULER-DB does not mask repeats but uses them instead as a powerful tool for contig ordering. We also describe a new approach for the Copy Number Problem: "How many times a given repeat is present in the genome?". For long nearly-perfect repeats this question is notoriously difficult and some copies of such repeats may be "lost" in genomic assemblies. We describe our EULER-CN algorithm for the Copy Number Problem that proved to be successful in difficult sequencing projects.  相似文献   

7.
An important computational technique for extracting the wealth of information hidden in human genomic sequence data is to compare the sequence with that from the corresponding region of the mouse genome, looking for segments that are conserved over evolutionary time. Moreover, the approach generalises to comparison of sequences from any two related species. The underlying rationale (which is abundantly confirmed by observation) is that a random mutation in a functional region is usually deleterious to the organism, and hence unlikely to become fixed in the population, whereas mutations in a non-functional region are free to accumulate over time.The potential value of this approach is so attractive that the public and private projects to sequence the human genome are now turning to sequencing the mouse, and you will soon be able to compare the human and mouse sequences of your favourite genomic region.We are currently witnessing an explosion of computer tools for comparative analysis of two genomic sequences. Here the capabilities of two new network servers for comparing genomic sequences from any pair of closely related species are sketched.The Syntenic Gene Prediction Program SGP-I utilises sequence comparisons to enhance the ability to locate protein coding segments in genomic data. PipMaker attempts to determine all conserved genomic regions, regardless of their function.  相似文献   

8.
We propose a network-based approach for surmising the spatial organization of genomes from high-throughput interaction data. Our strategy is based on methods for inferring architectural features of networks. Specifically, we employ a community detection algorithm to partition networks of genomic interactions. These community partitions represent an intuitive interpretation of genomic organization from interaction data. Furthermore, they are able to recapitulate known aspects of the spatial organization of the Saccharomyces cerevisiae genome, such as the rosette conformation of the genome, the clustering of centromeres, as well as tRNAs, and telomeres. We also demonstrate that simple architectural features of genomic interaction networks, such as cliques, can give meaningful insight into the functional role of the spatial organization of the genome. We show that there is a correlation between inter-chromosomal clique size and replication timing, as well as cohesin enrichment. Together, our network-based approach represents an effective and intuitive framework for interpreting high-throughput genomic interaction data. Importantly, there is a great potential for this strategy, given the rich literature and extensive set of existing tools in the field of network analysis.  相似文献   

9.
Plant microRNAs (miRNAs) are single-stranded 20-22 nt small RNAs (sRNA) that are produced from their own genes. We have developed a de novo genome-wide approach for the computational identification of novel plant miRNAs based on the integration of the complete genome sequence with sRNA libraries. It comprises three modules - the clustering module identifies genomic regions that have two closely-located unidirectional sRNA clusters, the mirplan module explores the secondary structure of the genomic regions, and the duplex module predicts miRNA/miRNA* duplexes. We applied our approach to the Brachypodium genome and publicly available sRNA libraries and predicted 102 miRNAs. Our results extend the list of known miRNAs with 58 novel miRNAs and define the genomic loci of all predicted miRNAs. Because this approach considers specific features of plant miRNAs, it can be employed for the analysis of the genome and sRNA libraries generated for plant species to achieve systematic miRNA discovery.  相似文献   

10.
We describe strand-specific, base-resolution detection of 5-hydroxymethylcytosine (5-hmC) in genomic DNA with single-molecule sensitivity, combining a bioorthogonal, selective chemical labeling method of 5-hmC with single-molecule, real-time (SMRT) DNA sequencing. The chemical labeling not only allows affinity enrichment of 5-hmC-containing DNA fragments but also enhances the kinetic signal of 5-hmC during SMRT sequencing. We applied the approach to sequence 5-hmC in a genomic DNA sample with high confidence.  相似文献   

11.
The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then, numerous variants of the original idea have emerged: however, in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper, we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8 Mb of sequence in each organism. We show that our approach compares favorably with existing ab initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as four organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.  相似文献   

12.
Genome scans have become a common approach to identify genomic signatures of natural selection and reproductive isolation, as well as the genomic bases of ecologically relevant phenotypes, based on patterns of polymorphism and differentiation among populations or species. Here, we review the results of studies taking genome scan approaches in plants, consider the patterns of genomic differentiation documented and their possible causes, discuss the results in light of recent models of genomic differentiation during divergent adaptation and speciation, and consider assumptions and caveats in their interpretation. We find that genomic regions of high divergence generally appear quite small in comparisons of both closely and more distantly related populations, and for the most part, these differentiated regions are spread throughout the genome rather than strongly clustered. Thus, the genome scan approach appears well-suited for identifying genomic regions or even candidate genes that underlie adaptive divergence and/or reproductive barriers. We consider other methodologies that may be used in conjunction with genome scan approaches, and suggest further developments that would be valuable. These include broader use of sequence-based markers of known genomic location, greater attention to sampling strategies to make use of parallel environmental or phenotypic transitions, more integration with approaches such as quantitative trait loci mapping and measures of gene flow across the genome, and additional theoretical and simulation work on processes related to divergent adaptation and speciation.  相似文献   

13.
MOTIVATION: It is known that most genomic regions of special interest, e.g. horizontally acquired sequences, genomic islands, etc. have distinct word (m-mer) compositions. Most of the earlier work along this direction, addressed di- and tri-nucleotide compositions. We present an approach that can be applied to analyze compositions of any given word size. The method, called the centroid approach, can reveal compositionally distinct regions in genomic sequences for any given word size. RESULTS: We applied our method to 50 bacterial genomes and demonstrated its ability to identify embedded sequences of varying lengths from distantly related organisms. We also investigated the genetic makeup of the regions identified as compositionally distinct by our method, for four organisms from our dataset. Pathogenicity island (PAI) components and genes encoding strain-specific proteins are all frequently seen to be constituents of these regions. AVAILABILITY: Program is available on request from the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
We propose a novel Bayesian approach that robustifies genomic modeling by leveraging expert knowledge (EK) through prior distributions. The central component is the hierarchical decomposition of phenotypic variation into additive and nonadditive genetic variation, which leads to an intuitive model parameterization that can be visualized as a tree. The edges of the tree represent ratios of variances, for example broad-sense heritability, which are quantities for which EK is natural to exist. Penalized complexity priors are defined for all edges of the tree in a bottom-up procedure that respects the model structure and incorporates EK through all levels. We investigate models with different sources of variation and compare the performance of different priors implementing varying amounts of EK in the context of plant breeding. A simulation study shows that the proposed priors implementing EK improve the robustness of genomic modeling and the selection of the genetically best individuals in a breeding program. We observe this improvement in both variety selection on genetic values and parent selection on additive values; the variety selection benefited the most. In a real case study, EK increases phenotype prediction accuracy for cases in which the standard maximum likelihood approach did not find optimal estimates for the variance components. Finally, we discuss the importance of EK priors for genomic modeling and breeding, and point to future research areas of easy-to-use and parsimonious priors in genomic modeling.  相似文献   

15.
This study introduces a DNA microarray-based genotyping system for accessing single nucleotide polymorphisms (SNPs) directly from a genomic DNA sample. The described one-step approach combines multiplex amplification and allele-specific solid-phase PCR into an on-chip reaction platform. The multiplex amplification of genomic DNA and the genotyping reaction are both performed directly on the microarray in a single reaction. Oligonucleotides that interrogate single nucleotide positions within multiple genomic regions of interest are covalently tethered to a glass chip, allowing quick analysis of reaction products by fluorescence scanning. Due to a fourfold SNP detection approach employing simultaneous probing of sense and antisense strand information, genotypes can be automatically assigned and validated using a simple computer algorithm. We used the described procedure for parallel genotyping of 10 different polymorphisms in a single reaction and successfully analyzed more than 100 human DNA samples. More than 99% of genotype data were in agreement with data obtained in control experiments with allele-specific oligonucleotide hybridization and capillary sequencing. Our results suggest that this approach might constitute a powerful tool for the analysis of genetic variation.  相似文献   

16.
We have developed an integrated approach, using genetic and genomic methods, in conjunction with resources from the Southwest National Primate Research Center (SNPRC) baboon colony, for the identification of genes and their functional variants that encode quantitative trait loci (QTL). In addition, we use comparative genomic methods to overcome the paucity of baboon specific reagents and to augment translation of our findings in a nonhuman primate (NHP) to the human population. We are using the baboon as a model to study the genetics of cardiovascular disease (CVD). A key step for understanding gene–environment interactions in cardiovascular disease is the identification of genes and gene variants that influence CVD phenotypes. We have developed a sequential methodology that takes advantage of the SNPRC pedigreed baboon colony, the annotated human genome, and current genomic and bioinformatic tools. The process of functional polymorphism identification for genes encoding QTLs involves comparison of expression profiles for genes and predicted genes in the genomic region of the QTL for individuals discordant for the phenotypic trait mapping to the QTL. After comparison, genes of interest are prioritized, and functional polymorphisms are identified in candidate genes by genotyping and quantitative trait nucleotide analysis. This approach reduces the time and labor necessary to prioritize and identify genes and their polymorphisms influencing variation in a quantitative trait compared with traditional positional cloning methods.  相似文献   

17.
Comparative genomics is a powerful tool of genome functional specificity predictions and investigation of evolution specificity. Background of a large field of bioinformatics investigations is a computation of different scores of sequences and comparing them with a threshold. Comparative genomic analysis involves scores comparing for orthological groups of genetic objects. In this paper we represent a statistical approach to comparative genomic analysis, that based on investigation of diffusion in sequence space determined by neutral evolution of sequences. Using this approach we represent several statistics for selection pressure estimation and analyze statistics for several biological problems. We formulate technology of statistics applying to obtain new biological information. This approach is represented as Java-class library.  相似文献   

18.
We present a new concept in DNA engineering based on a pipeline of serial recombineering steps in liquid culture. This approach is fast, straightforward and facilitates simultaneous processing of multiple samples in parallel. We validated the approach by generating green fluorescent protein (GFP)-tagged transgenes from Caenorhabditis briggsae genomic clones in a multistep pipeline that takes only 4 d. The transgenes were engineered with minimal disturbance to the natural genomic context so that the correct level and pattern of expression will be secured after transgenesis. An example transgene for the C. briggsae ortholog of lin-59 was used for ballistic transformation in Caenorhabditis elegans. We show that the cross-species transgene is correctly expressed and rescues RNA interference (RNAi)-mediated knockdown of the endogenous C. elegans gene. The strategy that we describe adapts the power of recombineering in Escherichia coli for fluent DNA engineering to a format that can be directly scaled up for genomic projects.  相似文献   

19.
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.  相似文献   

20.
Detecting genomic structural variants from high-throughput sequencing data is a complex and unresolved challenge. We have developed a statistical learning approach, based on Random Forests, that integrates prior knowledge about the characteristics of structural variants and leads to improved discovery in high-throughput sequencing data. The implementation of this technique, forestSV, offers high sensitivity and specificity coupled with the flexibility of a data-driven approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号