共查询到20条相似文献,搜索用时 0 毫秒
1.
Incorrect paternity assignment in cattle can have a major effect on rates of genetic gain. Of the 576 Israeli Holstein bulls genotyped by the BovineSNP50 BeadChip, there were 204 bulls for which the father was also genotyped. The results of 38 828 valid single nucleotide polymorphisms (SNPs) were used to validate paternity, determine the genotyping error rates and determine criteria enabling deletion of defective SNPs from further analysis. Based on the criterion of >2% conflicts between the genotype of the putative sire and son, paternity was rejected for seven bulls (3.5%). The remaining bulls had fewer conflicts by one or two orders of magnitude. Excluding these seven bulls, all other discrepancies between sire and son genotypes are assumed to be caused by genotyping mistakes. The frequency of discrepancies was >0.07 for nine SNPs, and >0.025 for 81 SNPs. The overall frequency of discrepancies was reduced from 0.00017 to 0.00010 after deletion of these 81 SNPs, and the total expected fraction of genotyping errors was estimated to be 0.05%. Paternity of bulls that are genotyped for genomic selection may be verified or traced against candidate sires at virtually no additional cost. 相似文献
2.
Cino Pertoldi Małgorzata Tokarska Jan M. Wójcik Agata Kawałko Ettore Randi Torsten N. Kristensen Volker Loeschcke David Coltman Gregory A. Wilson Vivi R. Gregersen Christian Bendixen 《Acta theriologica》2010,55(2):97-108
Here we present the first attempt to use the BovineSNP50 Illumina Genotyping BeadChip for genome-wide screening of European bison Bison bonasus bonasus (EB), two subspecies of American bison: the plains bison Bison bison bison (PB), the wood bison Bison bison athabascae (WB) and seven cattle Bos taurus breeds. Our aims were to (1) reconstruct their evolutionary relationships, (2) detect any genetic signature of past bottlenecks and to quantify the consequences of bottlenecks on the genetic distances amongst bison subspecies and cattle, and (3) detect loci under positive or stabilizing selection. A Bayesian clustering procedure (STRUCTURE) detected ten genetically distinct clusters, with separation among all seven cattle breeds and European and American bison, but no separation between plain and wood bison. A linkage disequilibrium based program (LDNE) was used to estimate the effective population size (N e) for the cattle breeds; N e was generally low, relative to the census size of the breeds (cattle breeds: mean N e = 299.5, min N e = 18.1, max N e = 755.0). BOTTLENECK 1.2 detected signs of population bottlenecks in EB, PB and WB populations (sign test and standardized sign test: p = 0.0001). Evidence for loci under selection was found in cattle but not in bison. All extant wild populations of bison have shown to have survived severe bottlenecks, which has likely had large effects on genetic diversity within and differentiation among groups. 相似文献
3.
Michelizzi VN Wu X Dodson MV Michal JJ Zambrano-Varon J McLean DJ Jiang Z 《International journal of biological sciences》2010,7(1):18-27
The Illumina BovineSNP50 BeadChip features 54,001 informative single nucleotide polymorphisms (SNPs) that uniformly span the entire bovine genome. Among them, 52,255 SNPs have locations assigned in the current genome assembly (Btau_4.0), including 19,294 (37%) intragenic SNPs (i.e., located within genes) and 32,961 (63%) intergenic SNPs (i.e., located between genes). While the SNPs represented on the Illumina Bovine50K BeadChip are evenly distributed along each bovine chromosome, there are over 14,000 genes that have no SNPs placed on the current BeadChip. Kernel density estimation, a non-parametric method, was used in the present study to identify SNP-poor and SNP-rich regions on each bovine chromosome. With bandwidth = 0.05 Mb, we observed that most regions have SNP densities within 2 standard deviations of the chromosome SNP density mean. The SNP density on chromosome X was the most dynamic, with more than 30 SNP-rich regions and at least 20 regions with no SNPs. Genotyping ten water buffalo using the Illumina BovineSNP50 BeadChip revealed that 41,870 of the 54,001 SNPs are fully scored on all ten water buffalo, but 6,771 SNPs are partially scored on one to nine animals. Both fully scored and partially/no scored SNPs are clearly clustered with various sizes on each chromosome. However, among 43,687 bovine SNPs that were successfully genotyped on nine and ten water buffalo, only 1,159 were polymorphic in the species. These results indicate that the SNPs sites, but not the polymorphisms, are conserved between two species. Overall, our present study provides a solid foundation to further characterize the SNP evolutionary process, thus improving understanding of within- and between-species biodiversity, phylogenetics and adaption to environmental changes. 相似文献
4.
Motivation
Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.Results
We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.Availability
QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.Contact
ude.dmu@siacramg. 相似文献5.
Jun-Seok Song Ha-Seung Seong Bong-Hwan Choi Chang-Woo Lee Nam-Hyun Hwang Dajeong Lim Joon-Hee Lee Jin Soo Kim Jeong-Dae Kim Yeon-Soo Park Jung-Woo Choi Jong-Bok Kim 《Genes & genomics.》2018,40(12):1373-1382
Hanwoo and Chikso are classified as Korean native cattle breeds that are currently registered with the Food and Agriculture Organization. However, there is still a lack of genomic studies to compare Hanwoo to Chikso populations. The objective of this study was to perform genome-wide analysis of Hanwoo and Chikso populations, investigating the genetic relationships between these two populations. We genotyped a total of 319 cattle including 214 Hanwoo and 105 Chikso sampled from Gangwon Province Livestock Technology Research Institute, using the Illumina Bovine SNP50K Beadchip. After performing quality control on the initially generated datasets, we assessed linkage disequilibrium patterns for all the possible SNP pairs within 1 Mb apart. Overall, average r2 values in Hanwoo (0.048) were lower than Chikso (0.074) population. The genetic relationship between the populations was further assured by the principal component analysis, exhibiting clear clusters in each of the Hanwoo and Chikso populations, respectively. Overall heterozygosity for Hanwoo (0.359) was slightly higher than Chikso (0.345) and inbreeding coefficient was also a bit higher in Hanwoo (??0.015) than Chikso (??0.035). The average FST value was 0.036 between Hanwoo and Chikso, indicating little genetic differentiation between those two breeds. Furthermore, we found potential selection signatures including LRP1B and NTRK2 genes that might be implicated with meat and reproductive traits in cattle. In this study, the results showed that both Hanwoo and Chikso populations were not under severe level of inbreeding. Although the principal component analysis exhibited clear clusters in each of the populations, we did not see any clear evidence that those two populations are highly differentiated each other. 相似文献
6.
E. Pfaffelhuber 《Biological cybernetics》1971,8(2):50-51
Summary Experimental determination of the entropy H and information rate T of experiments or information sources rely on measurements of the relative frequencies of events and thus furnish only approximations to the exact values of H and T, defined by probabilities rather than relative frequencies. We derive an error estimate for the measured entropy and information rate, which depends only upon the number of possible events and not upon the numerical values of their probabilities, and thereby answer the question of how often experiments should be repeated independently in order that the measured entropy and information rate come sufficiently close to the exact values of H and T with probability sufficiently close to 1. 相似文献
7.
Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the F(ST)-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1-30.1 million years before present). 相似文献
8.
Missing data are a great concern in longitudinal studies, because few subjects will have complete data and missingness could be an indicator of an adverse outcome. Analyses that exclude potentially informative observations due to missing data can be inefficient or biased. To assess the extent of these problems in the context of genetic analyses, we compared case-wise deletion to two multiple imputation methods available in the popular SAS package, the propensity score and regression methods. For both the real and simulated data sets, the propensity score and regression methods produced results similar to case-wise deletion. However, for the simulated data, the estimates of heritability for case-wise deletion and the two multiple imputation methods were much lower than for the complete data. This suggests that if missingness patterns are correlated within families, then imputation methods that do not allow this correlation can yield biased results. 相似文献
9.
Background
Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.Results
The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.Conclusions
The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical. 相似文献10.
Effects of different nitrogen sources on the erythromycin production were investigated in 50 l fermenter with multi-parameter monitoring system firstly. With the increase of soybean flour concentration from 27 g/l to 37 g/l to the culture medium, the erythromycin production had no obvious increase. Whereas adding corn steep liquor 15 g/l in the medium was beneficial for the production of erythromycin, the maximum erythromycin production was 22.2% higher than that of the control. It was found that corn steep liquor can regulate and enhance the oxygen uptake rate (OUR) which characterizes the activity of the microbial metabolism by inter-scale observation and data association. Both Intracellular and extracellular organic acids of central metabolism were analyzed, and it was found that the whole levels of lactic acid, pyruvic acid, citric acid, and propionic acid were higher than those of control before 64th h. The consumption amount of amino acids, which could be transformed into the precursors for erythromycin synthesis (i.e. threonine, serine, alanine, glycine and phenylalanine), were elevated compared with the control in erythromycin biosynthesis phase. The results indicated that corn steep liquor can regulate OUR to certain level in the early phase of fermentation, and enhance the metabolic flux of erythromycin biosynthesis. Erythromycin production was successfully scaled up from a laboratory scale (50 l fermenter) to an industrial scale (132 m(3) and 372 m(3)) using OUR as the scale-up parameter. Erythromycin production on industrial scale was similar to that at laboratory scale. 相似文献
11.
A genotype calling algorithm for the Illumina BeadArray platform 总被引:2,自引:0,他引:2
Teo YY Inouye M Small KS Gwilliam R Deloukas P Kwiatkowski DP Clark TG 《Bioinformatics (Oxford, England)》2007,23(20):2741-2746
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors. 相似文献
12.
Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms. 相似文献
13.
Background
DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.Results
Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.Conclusions
We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.14.
15.
Individual elements of many extinct and extant North American rhinocerotids display osteopathologies, particularly exostoses, abnormal textures, and joint margin porosity, that are commonly associated with localized bone trauma. When we evaluated six extinct rhinocerotid species spanning 50 million years (Ma), we found the incidence of osteopathology increases from 28% of all elements of Eocene Hyrachyus eximius to 65–80% of all elements in more derived species. The only extant species in this study, Diceros bicornis, displayed less osteopathologies (50%) than the more derived extinct taxa. To get a finer-grained picture, we scored each fossil for seven pathological indicators on a scale of 1–4. We estimated the average mass of each taxon using M1-3 length and compared mass to average pathological score for each category. We found that with increasing mass, osteopathology also significantly increases. We then ran a phylogenetically-controlled regression analysis using a time-calibrated phylogeny of our study taxa. Mass estimates were found to significantly covary with abnormal foramen shape and abnormal bone textures. This pattern in osteopathological expression may reflect a part of the complex system of adaptations in the Rhinocerotidae over millions of years, where increased mass, cursoriality, and/or increased life span are selected for, to the detriment of long-term bone health. This work has important implications for the future health of hoofed animals and humans alike. 相似文献
16.
Laura J Corbin Andreas Kranis Sarah C Blott June E Swinburne Mark Vaudin Stephen C Bishop John A Woolliams 《遗传、选种与进化》2014,46(1):9
Background
Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem.Results
Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money.Conclusions
Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy. 相似文献17.
18.
The first part of the article reviews the Data Augmentation algorithm and presents two approximations to the Data Augmentation algorithm for the analysis of missing-data problems: the Poor Man's Data Augmentation algorithm and the Asymptotic Data Augmentation algorithm. These two algorithms are then implemented in the context of censored regression data to obtain semiparametric methodology. The performances of the censored regression algorithms are examined in a simulation study. It is found, up to the precision of the study, that the bias of both the Poor Man's and Asymptotic Data Augmentation estimators, as well as the Buckley-James estimator, does not appear to differ from zero. However, with regard to mean squared error, over a wide range of settings examined in this simulation study, the two Data Augmentation estimators have a smaller mean squared error than does the Buckley-James estimator. In addition, associated with the two Data Augmentation estimators is a natural device for estimating the standard error of the estimated regression parameters. It is shown how this device can be used to estimate the standard error of either Data Augmentation estimate of any parameter (e.g., the correlation coefficient) associated with the model. In the simulation study, the estimated standard error of the Asymptotic Data Augmentation estimate of the regression parameter is found to be congruent with the Monte Carlo standard deviation of the corresponding parameter estimate. The algorithms are illustrated using the updated Stanford heart transplant data set. 相似文献
19.
The Illumina Genome Analyzer generates millions of short sequencing reads. We present Ibis (Improved base identification system), an accurate, fast and easy-to-use base caller that significantly reduces the error rate and increases the output of usable reads. Ibis is faster and more robust with respect to chemistry and technology than other publicly available packages. Ibis is freely available under the GPL from . 相似文献
20.
Five strategies for pre-processing intensities from Illumina expression BeadChips are assessed from the point of view of precision and bias. The strategies include a popular variance stabilizing transformation and model-based background corrections that either use or ignore the control probes. Four calibration data sets are used to evaluate precision, bias and false discovery rate (FDR). The original algorithms are shown to have operating characteristics that are not easily comparable. Some tend to minimize noise while others minimize bias. Each original algorithm is shown to have an innate intensity offset, by which unlogged intensities are bounded away from zero, and the size of this offset determines its position on the noise-bias spectrum. By adding extra offsets, a continuum of related algorithms with different noise-bias trade-offs is generated, allowing direct comparison of the performance of the strategies on equivalent terms. Adding a positive offset is shown to decrease the FDR of each original algorithm. The potential of each strategy to generate an algorithm with an optimal noise-bias trade-off is explored by finding the offset that minimizes its FDR. The use of control probes as part of the background correction and normalization strategy is shown to achieve the lowest FDR for a given bias. 相似文献