首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have developed a software package named PEAS to facilitate analyses of large data sets of single nucleotide polymorphisms (SNPs) for population genetics and molecular phylogenetics studies. PEAS reads SNP data in various formats as input and is versatile in data formatting; using PEAS, it is easy to create input files for many popular packages, such as STRUCTURE, frappe, Arlequin, Haploview, LDhat, PLINK, EIGENSOFT, PHASE, fastPHASE, MEGA and PHYLIP. In addition, PEAS fills up several analysis gaps in currently available computer programs in population genetics and molecular phylogenetics. Notably, (i) It calculates genetic distance matrices with bootstrapping for both individuals and populations from genome-wide high-density SNP data, and the output can be streamlined to MEGA and PHYLIP programs for further processing; (ii) It calculates genetic distances from STRUCTURE output and generates MEGA file to reconstruct component trees; (iii) It provides tools to conduct haplotype sharing analysis for phylogenetic studies based on high-density SNP data. To our knowledge, these analyses are not available in any other computer program. PEAS for Windows is freely available for academic users from http://www.picb.ac.cn/~xushua/index.files/Download_PEAS.htm.  相似文献   

2.
一个实用的群体遗传学分析软件包——GENEPOP 3.1版   总被引:5,自引:0,他引:5  
GENEPOP是一个非常实用的群体遗传学分析软件包,适用于对大量的群体遗传学数据进行分析。它主要有以下3个方面的用途:1)进行正合检验,如对哈迪-温伯格平衡、种群差异和位点间的连锁不平衡进行检验;2)估算经典的群体遗传学参数,如Fst和其它相关指数及基因频率等;3)可把GENEPOP的文件转换为常用的群体遗传学分析软件包(如BISYS、FSTAT和LINKDOS)所要求的输入文件格式。与软件BIO  相似文献   

3.
convert is a user‐friendly, 32‐bit Windows program that facilitates ready transfer of codominant, diploid genotypic data amongst commonly used population genetic software packages. convert reads input files in its own ‘standard’ data format, easily produced from an excel file of diploid, codominant marker data, and can convert these to the input formats of the following programs: gda , genepop , arlequin , popgene , microsat , phylip , and structure . convert can also read input files in genepop format. In addition, convert can produce a summary table of allele frequencies in which private alleles and the sample sizes at each locus are indicated.  相似文献   

4.
There has been a great increase in both the number of population genetic analysis programs and the size of data sets being studied with them. Since the file formats required by the most popular and useful programs are variable, automated reformatting or conversion between them is desirable. formatomatic is an easy to use program that can read allelic data files in genepop , raw (csv ) or convert formats and create data files in nine formats: raw (csv ), arlequin , genepop , immanc /bayesass +, migrate , newhybrids , msvar , baps and structure . Use of formatomatic should greatly reduce time spent reformatting data sets and avoid unnecessary errors.  相似文献   

5.
We present eight computer programs written in the C programming language that are designed to analyze genotypic data and to support existing software used to construct genetic linkage maps. Although each program has a unique purpose, they all share the common goals of affording a greater understanding of genetic linkage data and of automating tasks to make computers more effective tools for map building. The PIC/HET and FAMINFO programs automate calculation of relevant quantities such as heterozygosity, PIC, allele frequencies, and informativeness of markers and pedigrees. PREINPUT simplifies data submissions to the Centre d'Etude du Polymorphisme Humain (CEPH) data base by creating a file with genotype assignments that CEPH's INPUT program would otherwise require to be input manually. INHERIT is a program written specifically for mapping the X chromosome: by assigning a dummy allele to males, in the nonpseudoautosomal region, it eliminates falsely perceived noninheritances in the data set. The remaining four programs complement the previously published genetic linkage mapping software CRI-MAP and LINKAGE. TWOTABLE produces a more readable format for the output of CRI-MAP two-point calculations; UNMERGE is the converse to CRI-MAP's merge option; and GENLINK and LINKGEN automatically convert between the genotypic data file formats required by these packages. All eight applications read input from the same types of data files that are used by CRI-MAP and LINKAGE. Their use has simplified the management of data, has increased knowledge of the content of information in pedigrees, and has reduced the amount of time needed to construct genetic linkage maps of chromosomes.  相似文献   

6.
7.
Given the growing amount of biological data, data mining methods have become an integral part of bioinformatics research. Unfortunately, standard data mining tools are often not sufficiently equipped for handling raw data such as e.g. amino acid sequences. One popular and freely available framework that contains many well-known data mining algorithms is the Waikato Environment for Knowledge Analysis (Weka). In the BioWeka project, we introduce various input formats for bioinformatics data and bioinformatics methods like alignments to Weka. This allows users to easily combine them with Weka's classification, clustering, validation and visualization facilities on a single platform and therefore reduces the overhead of converting data between different data formats as well as the need to write custom evaluation procedures that can deal with many different programs. We encourage users to participate in this project by adding their own components and data formats to BioWeka. Availability: The software, documentation and tutorial are available at http://www.bioweka.org.  相似文献   

8.
ABSTRACT: BACKGROUND: Meta-analysis (MA) is widely used to pool genome-wide association studies (GWASes) in order to a) increasethe power to detect strong or weak genotype effects or b) as a result verification method. As a consequence ofdiffering SNP panels among genotyping chips, imputation is the method of choice within GWAS consortia toavoid losing too many SNPs in a MA. YAMAS (Yet Another Meta Analysis Software), however, enablescross-GWAS conclusions prior to finished and polished imputation runs, which eventually are time-consuming. RESULTS: Here we present a fast method to avoid forfeiting SNPs present in only a subset of studies, without relying onimputation. This is accomplished by using reference linkage disequilibrium data from 1,000Genomes/HapMap projects to find proxy-SNPs together with in-phase alleles for SNPs missing in at least onestudy. MA is conducted by combining association effect estimates of a SNP and those of its proxy-SNPs. Ouralgorithm is implemented in the MA software YAMAS. Association results from GWAS analysis applicationscan be used as input files for MA, tremendously speeding up MA compared to the conventional imputationapproach. We show that our proxy algorithm is well-powered and yields valuable ad hoc results, possiblyproviding an incentive for follow-up studies. We propose our method as a quick screening step prior toimputation-based MA, as well as an additional main approach for studies without available reference datamatching the ethnicities of study participants. As a proof of principle, we analyzed six dbGaP Type II DiabetesGWAS and found that the proxy algorithm clearly outperforms naive MA on the P-value level: for 17 out of23 we observe an improvement on the p-value level by a factor of more than two, and a maximumimprovement by a factor of 2127. CONCLUSIONS: YAMAS is an efficient and fast meta-analysis program which offers various methods, including conventionalMA as well as inserting proxy-SNPs for missing markers to avoid unnecessary power loss. MA with YAMAScan be readily conducted as YAMAS provides a generic parser for heterogeneous tabulated file formats withinthe GWAS field and avoids cumbersome setups. In this way, it supplements the meta-analysis process.  相似文献   

9.
MOTIVATION: Effective use of proteomics data, specifically mass spectrometry data, relies on the ability to read and write the many mass spectrometer file formats. Even with mass spectrometer vendor-specific libraries and vendor-neutral file formats, such as mzXML and mzData it can be difficult to extract raw data files in a form suitable for batch processing and basic research. Introduced here are the ProteomeCommons.org Input and Output Framework, abbreviated to IO Framework, which is designed to abstractly represent mass spectrometry data. This project is a public, open-source, free-to-use framework that supports most of the mass spectrometry data formats, including current formats, legacy formats and proprietary formats that require a vendor-specific library in order to operate. The IO Framework includes an on-line tool for non-programmers and a set of libraries that developers may use to convert between various proteomics file formats. AVAILABILITY: The current source-code and documentation for the ProteomeCommons.org IO Framework is freely available at http://www.proteomecommons.org/current/531/  相似文献   

10.
We describe PerlMAT, a Perl microarray toolkit providing easy to use object-oriented methods for the simplified manipulation, management and analysis of microarray data. The toolkit provides objects for the encapsulation of microarray spots and reporters, several common microarray data file formats and GAL files. In addition, an analysis object provides methods for data processing, and an image object enables the visualisation of microarray data. This important addition to the Perl developer's library will facilitate more widespread use of Perl for microarray application development within the bioinformatics community. The coherent interface and well-documented code enables rapid analysis by even inexperienced Perl developers. AVAILABILITY: Software is available at http://sourceforge.net/projects/perlmat  相似文献   

11.
12.
Nmrglue, an open source Python package for working with multidimensional NMR data, is described. When used in combination with other Python scientific libraries, nmrglue provides a highly flexible and robust environment for spectral processing, analysis and visualization and includes a number of common utilities such as linear prediction, peak picking and lineshape fitting. The package also enables existing NMR software programs to be readily tied together, currently facilitating the reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON, and Rowland NMR Toolkit file formats. In addition to standard applications, the versatility offered by nmrglue makes the package particularly suitable for tasks that include manipulating raw spectrometer data files, automated quantitative analysis of multidimensional NMR spectra with irregular lineshapes such as those frequently encountered in the context of biomacromolecular solid-state NMR, and rapid implementation and development of unconventional data processing methods such as covariance NMR and other non-Fourier approaches. Detailed documentation, install files and source code for nmrglue are freely available at http://nmrglue.com. The source code can be redistributed and modified under the New BSD license.  相似文献   

13.
Previous studies have reported that some important loci are missed in single-locus genome-wide association studies (GWAS), especially because of the large phenotypic error in field experiments. To solve this issue, multi-locus GWAS methods have been recommended. However, only a few software packages for multi-locus GWAS are available. Therefore, we developed an R software named mrMLM v4.0.2. This software integrates mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO methods developed by our lab. There are four components in mrMLM v4.0.2, including dataset input, parameter setting, software running, and result output. The fread function in data.table is used to quickly read datasets, especially big datasets, and the doParallel package is used to conduct parallel computation using multiple CPUs. In addition, the graphical user interface software mrMLM.GUI v4.0.2, built upon Shiny, is also available. To confirm the correctness of the aforementioned programs, all the methods in mrMLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets. The results confirm the superior performance of mrMLM v4.0.2 to other methods currently available. False positive rates are effectively controlled, albeit with a less stringent significance threshold. mrMLM v4.0.2 is publicly available at BioCode (https://bigd.big.ac.cn/biocode/tools/BT007077) or R (https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) as an open-source software.  相似文献   

14.
Genomewide association studies (GWAS) aim to identify genetic markers strongly associated with quantitative traits by utilizing linkage disequilibrium (LD) between candidate genes and markers. However, because of LD between nearby genetic markers, the standard GWAS approaches typically detect a number of correlated SNPs covering long genomic regions, making corrections for multiple testing overly conservative. Additionally, the high dimensionality of modern GWAS data poses considerable challenges for GWAS procedures such as permutation tests, which are computationally intensive. We propose a cluster‐based GWAS approach that first divides the genome into many large nonoverlapping windows and uses linkage disequilibrium network analysis in combination with principal component (PC) analysis as dimensional reduction tools to summarize the SNP data to independent PCs within clusters of loci connected by high LD. We then introduce single‐ and multilocus models that can efficiently conduct the association tests on such high‐dimensional data. The methods can be adapted to different model structures and used to analyse samples collected from the wild or from biparental F2 populations, which are commonly used in ecological genetics mapping studies. We demonstrate the performance of our approaches with two publicly available data sets from a plant (Arabidopsis thaliana) and a fish (Pungitius pungitius), as well as with simulated data.  相似文献   

15.
Although approaches for performing genome‐wide association studies (GWAS) are well developed, conventional GWAS requires high‐density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP‐GWAS (extreme‐phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well‐characterized kernel row number trait, which was selected to enable comparisons between the results of XP‐GWAS and conventional GWAS. An exome‐sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait‐associated variants were significantly enriched in regions identified by conventional GWAS. XP‐GWAS was able to resolve several linked QTL and detect trait‐associated variants within a single gene under a QTL peak. XP‐GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest.  相似文献   

16.
17.

Background  

Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages – Phred and Staden are used by preAssemble to perform sequence quality processing.  相似文献   

18.
19.
SNPselector: a web tool for selecting SNPs for genetic association studies   总被引:7,自引:0,他引:7  
SUMMARY: Single nucleotide polymorphisms (SNPs) are commonly used for association studies to find genes responsible for complex genetic diseases. With the recent advance of SNP technology, researchers are able to assay thousands of SNPs in a single experiment. But the process of manually choosing thousands of genotyping SNPs for tens or hundreds of genes is time consuming. We have developed a web-based program, SNPselector, to automate the process. SNPselector takes a list of gene names or a list of genomic regions as input and searches the Ensembl genes or genomic regions for available SNPs. It prioritizes these SNPs on their tagging for linkage disequilibrium, SNP allele frequencies and source, function, regulatory potential and repeat status. SNPselector outputs result in compressed Excel spreadsheet files for review by the user. AVAILABILITY: SNPselector is freely available at http://primer.duhs.duke.edu/  相似文献   

20.
Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号