首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.

Background

The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System, Applied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.

Results

SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample, experiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed all the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb R/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search functionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be downloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the Integrated Genomics Viewer.

Conclusions

SRAdb Bioconductor package provides a convenient and integrated framework to query and access SRA metadata quickly and powerfully from within R.  相似文献   

2.
3.
We present a new software package (hzar ) that provides functions for fitting molecular genetic and morphological data from hybrid zones to classic equilibrium cline models using the Metropolis–Hastings Markov chain Monte Carlo (MCMC) algorithm. The software applies likelihood functions appropriate for different types of data, including diploid and haploid genetic markers and quantitative morphological traits. The modular design allows flexibility in fitting cline models of varying complexity. To facilitate hypothesis testing, an autofit function is included that allows automated model selection from a set of nested cline models. Cline parameter values, such as cline centre and cline width, are estimated and may be compared statistically across clines. The package is written in the R language and is available through the Comprehensive R Archive Network (CRAN; http://cran.r-project.org/ ). Here, we describe hzar and demonstrate its use with a sample data set from a well‐studied hybrid zone in western Panama between white‐collared (Manacus candei) and golden‐collared manakins (M. vitellinus). Comparisons of our results with previously published results for this hybrid zone validate the hzar software. We extend analysis of this hybrid zone by fitting additional models to molecular data where appropriate.  相似文献   

4.
5.
An automated procedure for the analysis of homologous protein structures has been developed. The method facilitates the characterization of internal conformational differences and inter-conformer relationships and provides a framework for the analysis of protein structural evolution. The method is implemented in bio3d, an R package for the exploratory analysis of structure and sequence data. AVAILABILITY: The bio3d package is distributed with full source code as a platform-independent R package under a GPL2 license from: http://mccammon.ucsd.edu/~bgrant/bio3d/  相似文献   

6.
Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.  相似文献   

7.
8.
I describe an open‐source R package, multimark , for estimation of survival and abundance from capture–mark–recapture data consisting of multiple “noninvasive” marks. Noninvasive marks include natural pelt or skin patterns, scars, and genetic markers that enable individual identification in lieu of physical capture. multimark provides a means for combining and jointly analyzing encounter histories from multiple noninvasive sources that otherwise cannot be reliably matched (e.g., left‐ and right‐sided photographs of bilaterally asymmetrical individuals). The package is currently capable of fitting open population Cormack–Jolly–Seber (CJS) and closed population abundance models with up to two mark types using Bayesian Markov chain Monte Carlo (MCMC) methods. multimark can also be used for Bayesian analyses of conventional capture–recapture data consisting of a single‐mark type. Some package features include (1) general model specification using formulas already familiar to most R users, (2) ability to include temporal, behavioral, age, cohort, and individual heterogeneity effects in detection and survival probabilities, (3) improved MCMC algorithm that is computationally faster and more efficient than previously proposed methods, (4) Bayesian multimodel inference using reversible jump MCMC, and (5) data simulation capabilities for power analyses and assessing model performance. I demonstrate use of multimark using left‐ and right‐sided encounter histories for bobcats (Lynx rufus) collected from remote single‐camera stations in southern California. In this example, there is evidence of a behavioral effect (i.e., trap “happy” response) that is otherwise indiscernible using conventional single‐sided analyses. The package will be most useful to ecologists seeking stronger inferences by combining different sources of mark–recapture data that are difficult (or impossible) to reliably reconcile, particularly with the sparse datasets typical of rare or elusive species for which noninvasive sampling techniques are most commonly employed. Addressing deficiencies in currently available software, multimark also provides a user‐friendly interface for performing Bayesian multimodel inference using capture–recapture data consisting of a single conventional mark or multiple noninvasive marks.  相似文献   

9.
G protein coupled receptor kinase 2 (GRK2) plays a central role in the regulation of a variety of important signaling pathways. Alternation of GRK2 protein level and activity casts profound effects on cell physiological functions and causes diseases such as heart failure, rheumatoid arthritis, and obesity. We have previously reported that overexpression of GRK2 has an inhibitory role in cancer cell growth. To further examine the role of GRK2 in cancer, in this study, we investigated the effects of reduced protein level of GRK2 on insulin‐like growth factor 1 receptor (IGF‐1R) signaling pathway in human hepatocellular carcinoma (HCC) HepG2 cells. We created a GRK2 knockdown cell line using a lentiviral vector mediated expression of GRK2 specific short hairpin RNA (shRNA). Under IGF‐1 stimulation, HepG2 cells with reduced level of GRK2 showed elevated total IGF‐1R protein expression as well as tyrosine phosphorylation of receptor. In addition, HepG2 cells with reduced level of GRK2 also demonstrated increased tyrosine phosphorylation of IRS1 at the residue 612 and increased phosphorylation of Akt, indicating a stronger activation of IGF‐1R signaling pathway. However, HepG2 cells with reduced level of GRK2 did not display any growth advantage in culture as compared with the scramble control cells. We further detected that reduced level of GRK2 induced a small cell cycle arrest at G2/M phase by enhancing the expression of cyclin A, B1, and E. Our results indicate that GRK2 has contrasting roles on HepG2 cell growth by negatively regulating the IGF‐1R signaling pathway and cyclins' expression. J. Cell. Physiol. 228: 1897–1901, 2013. © 2013 Wiley Periodicals, Inc.  相似文献   

10.
We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance–covariance matrices ( G ). Large‐sample theory shows that maximum‐likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G . This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G , and of functions of G . We refer to this as the REML‐MVN method. This has been implemented in the mixed‐model program WOMBAT. Estimates of sampling variances from REML‐MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20‐dimensional data set for Drosophila wings. REML‐MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best‐estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML‐MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.  相似文献   

11.
sam βada is a genome–environment association software, designed to search for signatures of local adaptation. However, pre‐ and postprocessing of data can be labour‐intensive, preventing wider uptake of the method. We have now developed R.SamBada, an r ‐package providing a pipeline for landscape genomic analysis based on sam βada , spanning from the retrieval of environmental conditions at sampling locations to gene annotation using the Ensembl genome browser. As a result, R.SamBada standardizes the landscape genomics pipeline and eases the search for candidate genes of local adaptation, enhancing reproducibility of landscape genomic studies. The efficiency and power of the pipeline is illustrated using two examples: sheep populations from Morocco with no evident population structure and Lidia cattle from Spain displaying population substructuring. In both cases, R.SamBada enabled rapid identification and interpretation of candidate genes, which are further discussed in the light of local adaptation. The package is available in the r CRAN package repository and on GitHub (github.com/SolangeD/R.SamBada).  相似文献   

12.
Remotely sensed data – available at medium to high resolution across global spatial and temporal scales – are a valuable resource for ecologists. In particular, products from NASA's MODerate‐resolution Imaging Spectroradiometer (MODIS), providing twice‐daily global coverage, have been widely used for ecological applications. We present MODISTools, an R package designed to improve the accessing, downloading, and processing of remotely sensed MODIS data. MODISTools automates the process of data downloading and processing from any number of locations, time periods, and MODIS products. This automation reduces the risk of human error, and the researcher effort required compared to manual per‐location downloads. The package will be particularly useful for ecological studies that include multiple sites, such as meta‐analyses, observation networks, and globally distributed experiments. We give examples of the simple, reproducible workflow that MODISTools provides and of the checks that are carried out in the process. The end product is in a format that is amenable to statistical modeling. We analyzed the relationship between species richness across multiple higher taxa observed at 526 sites in temperate forests and vegetation indices, measures of aboveground net primary productivity. We downloaded MODIS derived vegetation index time series for each location where the species richness had been sampled, and summarized the data into three measures: maximum time‐series value, temporal mean, and temporal variability. On average, species richness covaried positively with our vegetation index measures. Different higher taxa show different positive relationships with vegetation indices. Models had high R2 values, suggesting higher taxon identity and a gradient of vegetation index together explain most of the variation in species richness in our data. MODISTools can be used on Windows, Mac, and Linux platforms, and is available from CRAN and GitHub ( https://github.com/seantuck12/MODISTools ).  相似文献   

13.
Cleidocranial dysplasia (CCD) is caused by haploinsufficiency in RUNX2 function. We have previously identified a series of RUNX2 mutations in Korean CCD patients, including a novel R131G missense mutation in the Runt‐homology domain. Here, we examine the functional consequences of the RUNX2R131G mutation, which could potentially affect DNA binding, nuclear localization signal, and/or heterodimerization with core‐binding factor‐β (CBF‐β). Immunofluorescence microscopy and western blot analysis with subcellular fractions show that RUNX2R131G is localized in the nucleus. Immunoprecipitation analysis reveals that heterodimerization with CBF‐β is retained. However, precipitation assays with biotinylated oligonucleotides and reporter gene assays with RUNX2 responsive promoters together reveal that DNA‐binding activity and consequently the transactivation of potential of RUNX2R131G is abrogated. We conclude that loss of DNA binding, but not nuclear localization or CBF‐β heterodimerization, causes RUNX2 haploinsufficiency in patients with the RUNX2R131G mutation. Retention of specific functions including nuclear localization and binding to CBF‐β of the RUNX2R131G mutation may render the mutant protein an effective competitor that interferes with wild‐type function. J. Cell. Biochem. 110: 97–103, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
15.
In this paper, we present the package detrendeR, a Graphical User Interface to facilitate the visualization and analysis of dendrochronological data, using the R computing environment. This package offers an easy way to perform most of the traditional tasks in dendrochronology: detrending, chronology building and graphical presentation of time series. The advantage of detrendeR, compared with the program ARSTAN, is the graphical interface that provides the user with an easy way to use R language, rich in graphics and handling routines, with no need to type commands. The detrendeR uses a simple and familiar dialog-box interface and it can read Tucson decadal-format files (*.rwl and *.crn) as well as plain text files. In addition, detrendeR has the ability to test temporal changes of the common signal using moving intervals. The detrendeR should make it easier to perform detrending and chronology building of tree-ring series, taking advantage of the R statistical programming environment.  相似文献   

16.
Many biological processes are periodic, for example cell cycle expression, circadian rhythms and calcium oscillations. However, measured time series from these processes are commonly short and noisy, and finding frequencies in such data can be challenging. Here we present BaSAR, Bayesian Spectrum Analysis in R, a package for extracting frequency information from time series data. The software uses advanced techniques of Bayesian inference that are well suited for handling typical biological data. The core functions are designed for detecting a single key frequency, without the need for data pre-processing such as detrending. The package is freely available at CRAN - The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/BaSAR.  相似文献   

17.
18.
POLYSAT: an R package for polyploid microsatellite analysis   总被引:4,自引:0,他引:4  
We present an R package to help remedy the lack of software for manipulating and analysing autopolyploid and allopolyploid microsatellite data. POLYSAT can handle genotype data of any ploidy, including populations of mixed ploidy, and assumes that allele copy number is always ambiguous in partial heterozygotes. It can import and export genotype data in eight different formats, calculate pairwise distances between individuals using a stepwise mutation and infinite alleles model, estimate ploidy based on allele counts and estimate allele frequencies and pairwise F(ST) values. This software is freely available through the Comprehensive R Archive Network (http://cran.r-project.org/) and includes a thorough tutorial.  相似文献   

19.
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace , combines tree metrics and multivariate analysis to provide low‐dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group‐specific consensus phylogenies. treespace also provides a user‐friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.  相似文献   

20.
The aim of the ecospat package is to make available novel tools and methods to support spatial analyses and modeling of species niches and distributions in a coherent workflow. The package is written in the R language (R Development Core Team) and contains several features, unique in their implementation, that are complementary to other existing R packages. Pre‐modeling analyses include species niche quantifications and comparisons between distinct ranges or time periods, measures of phylogenetic diversity, and other data exploration functionalities (e.g. extrapolation detection, ExDet). Core modeling brings together the new approach of ensemble of small models (ESM) and various implementations of the spatially‐explicit modeling of species assemblages (SESAM) framework. Post‐modeling analyses include evaluation of species predictions based on presence‐only data (Boyce index) and of community predictions, phylogenetic diversity and environmentally‐constrained species co‐occurrences analyses. The ecospat package also provides some functions to supplement the ‘biomod2’ package (e.g. data preparation, permutation tests and cross‐validation of model predictive power). With this novel package, we intend to stimulate the use of comprehensive approaches in spatial modelling of species and community distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号