首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Copy number variants (CNV) are a potentially important component of the genetic contribution to risk of common complex diseases. Analysis of the association between CNVs and disease requires that uncertainty in CNV copy-number calls, which can be substantial, be taken into account; failure to consider this uncertainty can lead to biased results. Therefore, there is a need to develop and use appropriate statistical tools. To address this issue, we have developed CNVassoc, an R package for carrying out association analysis of common copy number variants in population-based studies. This package includes functions for testing for association with different classes of response variables (e.g. class status, censored data, counts) under a series of study designs (case-control, cohort, etc) and inheritance models, adjusting for covariates. The package includes functions for inferring copy number (CNV genotype calling), but can also accept copy number data generated by other algorithms (e.g. CANARY, CGHcall, IMPUTE).

Results

Here we present a new R package, CNVassoc, that can deal with different types of CNV arising from different platforms such as MLPA o aCGH. Through a real data example we illustrate that our method is able to incorporate uncertainty in the association process. We also show how our package can also be useful when analyzing imputed data when analyzing imputed SNPs. Through a simulation study we show that CNVassoc outperforms CNVtools in terms of computing time as well as in convergence failure rate.

Conclusions

We provide a package that outperforms the existing ones in terms of modelling flexibility, power, convergence rate, ease of covariate adjustment, and requirements for sample size and signal quality. Therefore, we offer CNVassoc as a method for routine use in CNV association studies.  相似文献   

2.
affy--analysis of Affymetrix GeneChip data at the probe level   总被引:32,自引:0,他引:32  
MOTIVATION: The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. RESULTS: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform.  相似文献   

3.
Borstein  Samuel R. 《Hydrobiologia》2020,847(20):4285-4294
Hydrobiologia - This article introduces an R package, dietr, which calculates fractional trophic levels from quantitative diet item and qualitative food item data following the routine implemented...  相似文献   

4.
We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.  相似文献   

5.
snp.plotter is a newly developed R package which produces high-quality plots of results from genetic association studies. The main features of the package include options to display a linkage disequilibrium (LD) plot below the P-value plot using either the r2 or D' LD metric, to set the X-axis to equal spacing or to use the physical map of markers, and to specify plot labels, colors, symbols and LD heatmap color scheme. snp.plotter can plot single SNP and/or haplotype data and simultaneously plot multiple sets of results. R is a free software environment for statistical computing and graphics available for most platforms. The proposed package provides a simple way to convey both association and LD information in a single appealing graphic for genetic association studies. AVAILABILITY: Downloadable R package and example datasets are available at http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html and http://www.r-project.org.  相似文献   

6.
Sib-pair analysis is an increasingly important tool for genetic dissection of complex traits. Current methods for sib-pair analysis are primarily based on studying individual genetic markers one at a time and thus fail to use the full inheritance information provided by multipoint linkage analysis. In this paper, we describe how to extract the complete multipoint inheritance information for each sib pair. We then describe methods that use this information to map loci affecting traits, thereby providing a unified approach to both qualitative and quantitative traits. Specifically, complete multipoint approaches are presented for (1) exclusion mapping of qualitative traits; (2) maximum-likelihood mapping of qualitative traits; (3) information-content mapping, showing the extent to which all inheritance information has been extracted at each location in the genome; and (4) quantitative-trait mapping, by two parametric methods and one nonparametric method. In addition, we explore the effects of marker density, marker polymorphism, and availability of parents on the information content of a study. We have implemented the analysis methods in a new computer package, MAPMAKER/SIBS. With this computer package, complete multipoint analysis with dozens of markers in hundreds of sib pairs can be carried out in minutes.  相似文献   

7.
WHAP: haplotype-based association analysis   总被引:7,自引:0,他引:7  
We describe a software tool to perform haplotype-based association analysis, for quantitative and qualitative traits, in population and family samples, using single nucleotide polymorphism or multiallelic marker data. A range of tests is offered: omnibus and haplotype-specific tests; prospective and retrospective likelihoods; covariates and moderators; sliding window analyses; permutation P-values. We focus on the ability to flexibly impose constraints on haplotype effects, which allows for a range of conditional haplotype-based likelihood ratio tests: for example, whether an allele has an effect independent of its haplotypic background, or whether a single variant can explain the overall association at a locus. We illustrate using these tests to dissect a multi-locus association. AVAILABILITY: WHAP is a C/C++ program, freely available from the author's website: http://pngu.mgh.harvard.edu/purcell/whap/  相似文献   

8.
High-density single nucleotide polymorphism microarrays (SNP chips) provide information on a subject's genome, such as copy number and genotype (heterozygosity/homozygosity) at a SNP. While fluorescence in situ hybridization and karyotyping reveal many abnormalities, SNP chips provide a higher resolution map of the human genome that can be used to detect, e.g., aneuploidies, microdeletions, microduplications and loss of heterozygosity (LOH). As a variety of diseases are linked to such chromosomal abnormalities, SNP chips promise new insights for these diseases by aiding in the discovery of such regions, and may suggest targets for intervention. The R package SNPchip contains classes and methods useful for storing, visualizing and analyzing high density SNP data. Originally developed from the SNPscan web-tool, SNPchip utilizes S4 classes and extends other open source R tools available at Bioconductor. This has numerous advantages, including the ability to build statistical models for SNP-level data that operate on instances of the class, and to communicate with other R packages that add additional functionality. AVAILABILITY: The package is available from the Bioconductor web page at www.bioconductor.org. SUPPLEMENTARY INFORMATION: The supplementary material as described in this article (case studies, installation guidelines and R code) is available from http://biostat.jhsph.edu/~iruczins/publications/sm/  相似文献   

9.
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.  相似文献   

10.
Precise mapping of quantitative trait loci(QTLs)is critical for assessing genetic effects and identifying candidate genes for quantitative traits.Interval and composite interval mappings have been the methods of choice for several decades,which have provided tools for identifying genomic regions harboring causal genes for quantitative traits.Historically,the concept was developed on the basis of sparse marker maps where genotypes of loci within intervals could not be observed.Currently,genomes of many organisms have been saturated with markers due to the new sequencing technologies.Genotyping by sequencing usually generates hundreds of thousands of single nucleotide polymorphisms(SNPs),which often include the causal polymorphisms.The concept of interval no longer exists,prompting the necessity of a norm change in QTL mapping technology to make use of the high-volume genomic data.Here we developed a statistical method and a software package to map QTLs by binning markers into haplotype blocks,called bins.The new method detects associations of bins with quantitative traits.It borrows the mixed model methodology with a polygenic control from genome-wide association studies(GWAS)and can handle all kinds of experimental populations under the linear mixed model(LMM)framework.We tested the method using both simulated data and data from populations of rice.The results showed that this method has higher power than the current methods.An R package named binQTL is available from GitHub.  相似文献   

11.
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace , combines tree metrics and multivariate analysis to provide low‐dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group‐specific consensus phylogenies. treespace also provides a user‐friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.  相似文献   

12.
MOTIVATION: Core sets are necessary to ensure that access to useful alleles or characteristics retained in genebanks is guaranteed. We have successfully developed a computational tool named 'PowerCore' that aims to support the development of core sets by reducing the redundancy of useful alleles and thus enhancing their richness. RESULTS: The program, using a new approach completely different from any other previous methodologies, selects entries of core sets by the advanced M (maximization) strategy implemented through a modified heuristic algorithm. The developed core set has been validated to retain all characteristics for qualitative traits and all classes for quantitative ones. PowerCore effectively selected the accessions with higher diversity representing the entire coverage of variables and gave a 100% reproducible list of entries whenever repeated. AVAILABILITY: PowerCore software uses the .NET Framework Version 1.1 environment which is freely available for the MS Windows platform. The files can be downloaded from http://genebank.rda.go.kr/powercore/. The distribution of the package includes executable programs, sample data and a user manual.  相似文献   

13.
ABSTRACT: BACKGROUND: The lack of a uniform way for qualitative and quantitative evaluation of vaccine candidates under development led us to set up a standardized scheme for vaccine efficacy and safety evaluation. We developed and implemented molecular and immunology methods, and designed support tools for immunization data storage and analyses. Such collection can create a unique opportunity for immunologists to analyse data delivered from their laboratories. RESULTS: We designed and implemented GeVaDSs (Genetic Vaccine Decision Support system)- an interactive system for efficient storage, integration, retrieval and representation of data. Moreover, GeVaDSs allows for relevant association and interpretation of data, and thus for knowledge-based generation of testable hypotheses of vaccine responses. CONCLUSIONS: GeVaDSs has been tested by several laboratories in Europe, and proved its usefulness in vaccine analysis. Case study of its application is presented in the additional files. The system is available at: http://gevads.cs.put.poznan.pl/preview/ (login: viewer, password: password).  相似文献   

14.
Analysis of the long-range architecture of RNA is a challenging experimental and computational problem. Local nucleotide flexibility, which directly reports underlying base pairing and tertiary interactions in an RNA, can be comprehensively assessed at single nucleotide resolution using high-throughput selective 2'-hydroxyl acylation analyzed by primer extension (hSHAPE). hSHAPE resolves structure-sensitive chemical modification information by high-resolution capillary electrophoresis and typically yields quantitative nucleotide flexibility information for 300-650 nucleotides (nt) per experiment. The electropherograms generated in hSHAPE experiments provide a wealth of structural information; however, significant algorithmic analysis steps are required to generate quantitative and interpretable data. We have developed a set of software tools called ShapeFinder to make possible rapid analysis of raw sequencer data from hSHAPE, and most other classes of nucleic acid reactivity experiments. The algorithms in ShapeFinder (1) convert measured fluorescence intensity to quantitative cDNA fragment amounts, (2) correct for signal decay over read lengths extending to 600 nts or more, (3) align reactivity data to the known RNA sequence, and (4) quantify per nucleotide reactivities using whole-channel Gaussian integration. The algorithms and user interface tools implemented in ShapeFinder create new opportunities for tackling ambitious problems involving high-throughput analysis of structure-function relationships in large RNAs.  相似文献   

15.
We present EMAN (Electron Micrograph ANalysis), a software package for performing semiautomated single-particle reconstructions from transmission electron micrographs. The goal of this project is to provide software capable of performing single-particle reconstructions beyond 10 A as such high-resolution data become available. A complete single-particle reconstruction algorithm is implemented. Options are available to generate an initial model for particles with no symmetry, a single axis of rotational symmetry, or icosahedral symmetry. Model refinement is an iterative process, which utilizes classification by model-based projection matching. CTF (contrast transfer function) parameters are determined using a new paradigm in which data from multiple micrographs are fit simultaneously. Amplitude and phase CTF correction is then performed automatically as part of the refinement loop. A graphical user interface is provided, so even those with little image processing experience will be able to begin performing reconstructions. Advanced users can directly use the lower level shell commands and even expand the package utilizing EMAN's extensive image-processing library. The package was written from scratch in C++ and is provided free of charge on our Web site. We present an overview of the package as well as several conformance tests with simulated data.  相似文献   

16.
Climate-growth relationships are usually analysed using monthly climate data. The dendroTools R package also provides methodological approaches that enable climate-growth analysis for daily climate data. Such analysis reveals more complete climate signal patterns. In this article, new functions of the dendroTools R package are presented. Partial correlation coefficients are now implemented and can be used to calculate the strength of a linear relationship between two variables, while controlling for a third variable. Bootstrapped correlations can then be used to provide insights into the confidence intervals of statistical estimates. The calculation of partial and bootstrapped correlations is available for daily and monthly data. Finally, data transformation, S3 generic plotting and summary functions are also presented here.  相似文献   

17.
Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.  相似文献   

18.
Over the past few decades seed physiology research has contributed to many important scientific discoveries and has provided valuable tools for the production of high quality seeds. An important instrument for this type of research is the accurate quantification of germination; however gathering cumulative germination data is a very laborious task that is often prohibitive to the execution of large experiments. In this paper we present the germinator package: a simple, highly cost‐efficient and flexible procedure for high‐throughput automatic scoring and evaluation of germination that can be implemented without the use of complex robotics. The germinator package contains three modules: (i) design of experimental setup with various options to replicate and randomize samples; (ii) automatic scoring of germination based on the color contrast between the protruding radicle and seed coat on a single image; and (iii) curve fitting of cumulative germination data and the extraction, recap and visualization of the various germination parameters. The curve‐fitting module enables analysis of general cumulative germination data and can be used for all plant species. We show that the automatic scoring system works for Arabidopsis thaliana and Brassica spp. seeds, but is likely to be applicable to other species, as well. In this paper we show the accuracy, reproducibility and flexibility of the germinator package. We have successfully applied it to evaluate natural variation for salt tolerance in a large population of recombinant inbred lines and were able to identify several quantitative trait loci for salt tolerance. germinator is a low‐cost package that allows the monitoring of several thousands of germination tests, several times a day by a single person.  相似文献   

19.
20.
Three-dimensional image analysis includes image processing, segmentation and visualization operations, which facilitate the interpretation of data. We have developed a toolbox for three-dimensional (3D) electron microscopy (EM) in Amira, which is a commercial software package, used by many laboratories. Our toolbox integrates a number of established procedures specifically tailored for 3D EM. These include input-output, filtering, segmentation, visualization and ray-tracing functions, which can be accessed directly from a user-friendly pop-up menu. They allow performing denoising and segmentation tasks directly in Amira, without the need of other programs, and ultimately allow the visualization of the results at photo-realistic quality with ray-tracing. They also allow a direct interaction with the data, such that, e.g., sub-tomograms can be directly extracted, or segmentation areas can be interactively selected. The implemented functions are fast, reliable and intuitive, yielding a comprehensive package for visualization in EM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号