首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Incorrect paternity assignment in cattle can have a major effect on rates of genetic gain. Of the 576 Israeli Holstein bulls genotyped by the BovineSNP50 BeadChip, there were 204 bulls for which the father was also genotyped. The results of 38 828 valid single nucleotide polymorphisms (SNPs) were used to validate paternity, determine the genotyping error rates and determine criteria enabling deletion of defective SNPs from further analysis. Based on the criterion of >2% conflicts between the genotype of the putative sire and son, paternity was rejected for seven bulls (3.5%). The remaining bulls had fewer conflicts by one or two orders of magnitude. Excluding these seven bulls, all other discrepancies between sire and son genotypes are assumed to be caused by genotyping mistakes. The frequency of discrepancies was >0.07 for nine SNPs, and >0.025 for 81 SNPs. The overall frequency of discrepancies was reduced from 0.00017 to 0.00010 after deletion of these 81 SNPs, and the total expected fraction of genotyping errors was estimated to be 0.05%. Paternity of bulls that are genotyped for genomic selection may be verified or traced against candidate sires at virtually no additional cost.  相似文献   

2.
Here we present the first attempt to use the BovineSNP50 Illumina Genotyping BeadChip for genome-wide screening of European bison Bison bonasus bonasus (EB), two subspecies of American bison: the plains bison Bison bison bison (PB), the wood bison Bison bison athabascae (WB) and seven cattle Bos taurus breeds. Our aims were to (1) reconstruct their evolutionary relationships, (2) detect any genetic signature of past bottlenecks and to quantify the consequences of bottlenecks on the genetic distances amongst bison subspecies and cattle, and (3) detect loci under positive or stabilizing selection. A Bayesian clustering procedure (STRUCTURE) detected ten genetically distinct clusters, with separation among all seven cattle breeds and European and American bison, but no separation between plain and wood bison. A linkage disequilibrium based program (LDNE) was used to estimate the effective population size (N e) for the cattle breeds; N e was generally low, relative to the census size of the breeds (cattle breeds: mean N e = 299.5, min N e = 18.1, max N e = 755.0). BOTTLENECK 1.2 detected signs of population bottlenecks in EB, PB and WB populations (sign test and standardized sign test: p = 0.0001). Evidence for loci under selection was found in cattle but not in bison. All extant wild populations of bison have shown to have survived severe bottlenecks, which has likely had large effects on genetic diversity within and differentiation among groups.  相似文献   

3.
The Illumina BovineSNP50 BeadChip features 54,001 informative single nucleotide polymorphisms (SNPs) that uniformly span the entire bovine genome. Among them, 52,255 SNPs have locations assigned in the current genome assembly (Btau_4.0), including 19,294 (37%) intragenic SNPs (i.e., located within genes) and 32,961 (63%) intergenic SNPs (i.e., located between genes). While the SNPs represented on the Illumina Bovine50K BeadChip are evenly distributed along each bovine chromosome, there are over 14,000 genes that have no SNPs placed on the current BeadChip. Kernel density estimation, a non-parametric method, was used in the present study to identify SNP-poor and SNP-rich regions on each bovine chromosome. With bandwidth = 0.05 Mb, we observed that most regions have SNP densities within 2 standard deviations of the chromosome SNP density mean. The SNP density on chromosome X was the most dynamic, with more than 30 SNP-rich regions and at least 20 regions with no SNPs. Genotyping ten water buffalo using the Illumina BovineSNP50 BeadChip revealed that 41,870 of the 54,001 SNPs are fully scored on all ten water buffalo, but 6,771 SNPs are partially scored on one to nine animals. Both fully scored and partially/no scored SNPs are clearly clustered with various sizes on each chromosome. However, among 43,687 bovine SNPs that were successfully genotyped on nine and ten water buffalo, only 1,159 were polymorphic in the species. These results indicate that the SNPs sites, but not the polymorphisms, are conserved between two species. Overall, our present study provides a solid foundation to further characterize the SNP evolutionary process, thus improving understanding of within- and between-species biodiversity, phylogenetics and adaption to environmental changes.  相似文献   

4.
The Illumina BovineSNP50 BeadChip features 54,001 informative single nucleotide polymorphisms (SNPs) that uniformly span the entire bovine genome. Among them, 52,255 SNPs have locations assigned in the current genome assembly (Btau_4.0), including 19,294 (37%) intragenic SNPs (i.e., located within genes) and 32,961 (63%) intergenic SNPs (i.e., located between genes). While the SNPs represented on the Illumina Bovine50K BeadChip are evenly distributed along each bovine chromosome, there are over 14,000 genes that have no SNPs placed on the current BeadChip. Kernel density estimation, a non-parametric method, was used in the present study to identify SNP-poor and SNP-rich regions on each bovine chromosome. With bandwidth = 0.05 Mb, we observed that most regions have SNP densities within 2 standard deviations of the chromosome SNP density mean. The SNP density on chromosome X was the most dynamic, with more than 30 SNP-rich regions and at least 20 regions with no SNPs. Genotyping ten water buffalo using the Illumina BovineSNP50 BeadChip revealed that 41,870 of the 54,001 SNPs are fully scored on all ten water buffalo, but 6,771 SNPs are partially scored on one to nine animals. Both fully scored and partially/no scored SNPs are clearly clustered with various sizes on each chromosome. However, among 43,687 bovine SNPs that were successfully genotyped on nine and ten water buffalo, only 1,159 were polymorphic in the species. These results indicate that the SNPs sites, but not the polymorphisms, are conserved between two species. Overall, our present study provides a solid foundation to further characterize the SNP evolutionary process, thus improving understanding of within- and between-species biodiversity, phylogenetics and adaption to environmental changes.  相似文献   

5.
Cattle and water buffalo belong to the same subfamily Bovinae and share chromosome banding and gene order homology. In this study, we used genome-wide Illumina BovineSNP50 BeadChip to analyze 91 DNA samples from three breeds of water buffalo (Nili-Ravi, Murrah and their crossbred with local GuangXi buffalos in China), to demonstrate the genetic divergence between cattle and water buffalo through a large single nucleotide polymorphism (SNP) transferability study at the whole genome level, and performed association analysis of functional traits in water buffalo as well. A total of 40,766 (75.5 %) bovine SNPs were found in the water buffalo genome, but 49,936 (92.5 %) were with only one allele, and finally 935 were identified to be polymorphic and useful for association analysis in water buffalo. Therefore, the genome sequences of water buffalo and cattle shared a high level of homology but the polymorphic status of the bovine SNPs varied between these two species. The different patterns of mutations between species may associate with their phenotypic divergence due to genome evolution. Among 935 bovine SNPs, we identified a total of 9 and 7 SNPs significantly associated to fertility and milk production traits in water buffalo, respectively. However, more works in larger sample size are needed in future to verify these candidate SNPs for water buffalo.  相似文献   

6.

Motivation

Illumina Sequencing data can provide high coverage of a genome by relatively short (most often 100 bp to 150 bp) reads at a low cost. Even with low (advertised 1%) error rate, 100 × coverage Illumina data on average has an error in some read at every base in the genome. These errors make handling the data more complicated because they result in a large number of low-count erroneous k-mers in the reads. However, there is enough information in the reads to correct most of the sequencing errors, thus making subsequent use of the data (e.g. for mapping or assembly) easier. Here we use the term “error correction” to denote the reduction in errors due to both changes in individual bases and trimming of unusable sequence. We developed an error correction software called QuorUM. QuorUM is mainly aimed at error correcting Illumina reads for subsequent assembly. It is designed around the novel idea of minimizing the number of distinct erroneous k-mers in the output reads and preserving the most true k-mers, and we introduce a composite statistic π that measures how successful we are at achieving this dual goal. We evaluate the performance of QuorUM by correcting actual Illumina reads from genomes for which a reference assembly is available.

Results

We produce trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. We compared QuorUM against several published error correctors and found that it is the best performer in most metrics we use. QuorUM is efficiently implemented making use of current multi-core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). We also demonstrate that a third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error-corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Availability

QuorUM is distributed as an independent software package and as a module of the MaSuRCA assembly software. Both are available under the GPL open source license at http://www.genome.umd.edu.

Contact

ude.dmu@siacramg.  相似文献   

7.
Hanwoo and Chikso are classified as Korean native cattle breeds that are currently registered with the Food and Agriculture Organization. However, there is still a lack of genomic studies to compare Hanwoo to Chikso populations. The objective of this study was to perform genome-wide analysis of Hanwoo and Chikso populations, investigating the genetic relationships between these two populations. We genotyped a total of 319 cattle including 214 Hanwoo and 105 Chikso sampled from Gangwon Province Livestock Technology Research Institute, using the Illumina Bovine SNP50K Beadchip. After performing quality control on the initially generated datasets, we assessed linkage disequilibrium patterns for all the possible SNP pairs within 1 Mb apart. Overall, average r2 values in Hanwoo (0.048) were lower than Chikso (0.074) population. The genetic relationship between the populations was further assured by the principal component analysis, exhibiting clear clusters in each of the Hanwoo and Chikso populations, respectively. Overall heterozygosity for Hanwoo (0.359) was slightly higher than Chikso (0.345) and inbreeding coefficient was also a bit higher in Hanwoo (??0.015) than Chikso (??0.035). The average FST value was 0.036 between Hanwoo and Chikso, indicating little genetic differentiation between those two breeds. Furthermore, we found potential selection signatures including LRP1B and NTRK2 genes that might be implicated with meat and reproductive traits in cattle. In this study, the results showed that both Hanwoo and Chikso populations were not under severe level of inbreeding. Although the principal component analysis exhibited clear clusters in each of the populations, we did not see any clear evidence that those two populations are highly differentiated each other.  相似文献   

8.
9.
Summary Experimental determination of the entropy H and information rate T of experiments or information sources rely on measurements of the relative frequencies of events and thus furnish only approximations to the exact values of H and T, defined by probabilities rather than relative frequencies. We derive an error estimate for the measured entropy and information rate, which depends only upon the number of possible events and not upon the numerical values of their probabilities, and thereby answer the question of how often experiments should be repeated independently in order that the measured entropy and information rate come sufficiently close to the exact values of H and T with probability sufficiently close to 1.  相似文献   

10.
Haynes GD  Latch EK 《PloS one》2012,7(5):e36536
Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the F(ST)-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1-30.1 million years before present).  相似文献   

11.
Missing data are a great concern in longitudinal studies, because few subjects will have complete data and missingness could be an indicator of an adverse outcome. Analyses that exclude potentially informative observations due to missing data can be inefficient or biased. To assess the extent of these problems in the context of genetic analyses, we compared case-wise deletion to two multiple imputation methods available in the popular SAS package, the propensity score and regression methods. For both the real and simulated data sets, the propensity score and regression methods produced results similar to case-wise deletion. However, for the simulated data, the estimates of heritability for case-wise deletion and the two multiple imputation methods were much lower than for the complete data. This suggests that if missingness patterns are correlated within families, then imputation methods that do not allow this correlation can yield biased results.  相似文献   

12.

Background

Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.

Results

The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.

Conclusions

The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.  相似文献   

13.
Effects of different nitrogen sources on the erythromycin production were investigated in 50 l fermenter with multi-parameter monitoring system firstly. With the increase of soybean flour concentration from 27 g/l to 37 g/l to the culture medium, the erythromycin production had no obvious increase. Whereas adding corn steep liquor 15 g/l in the medium was beneficial for the production of erythromycin, the maximum erythromycin production was 22.2% higher than that of the control. It was found that corn steep liquor can regulate and enhance the oxygen uptake rate (OUR) which characterizes the activity of the microbial metabolism by inter-scale observation and data association. Both Intracellular and extracellular organic acids of central metabolism were analyzed, and it was found that the whole levels of lactic acid, pyruvic acid, citric acid, and propionic acid were higher than those of control before 64th h. The consumption amount of amino acids, which could be transformed into the precursors for erythromycin synthesis (i.e. threonine, serine, alanine, glycine and phenylalanine), were elevated compared with the control in erythromycin biosynthesis phase. The results indicated that corn steep liquor can regulate OUR to certain level in the early phase of fermentation, and enhance the metabolic flux of erythromycin biosynthesis. Erythromycin production was successfully scaled up from a laboratory scale (50 l fermenter) to an industrial scale (132 m(3) and 372 m(3)) using OUR as the scale-up parameter. Erythromycin production on industrial scale was similar to that at laboratory scale.  相似文献   

14.
A genotype calling algorithm for the Illumina BeadArray platform   总被引:2,自引:0,他引:2  
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors.  相似文献   

15.
Genotyping sheep for genome‐wide SNPs at lower density and imputing to a higher density would enable cost‐effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low‐density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50–475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single‐breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%.  相似文献   

16.

Background

DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases.

Results

Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation.

Conclusions

We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.
  相似文献   

17.
Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms.  相似文献   

18.
Imputation of high-density genotypes from low- or medium-density platforms is a promising way to enhance the efficiency of whole-genome selection programs at low cost. In this study, we compared the efficiency of three widely used imputation algorithms (fastPHASE, BEAGLE and findhap) using Chinese Holstein cattle with Illumina BovineSNP50 genotypes. A total of 2108 cattle were randomly divided into a reference population and a test population to evaluate the influence of the reference population size. Three bovine chromosomes, BTA1, 16 and 28, were used to represent large, medium and small chromosome size, respectively. We simulated different scenarios by randomly masking 20%, 40%, 80% and 95% single-nucleotide polymorphisms (SNPs) on each chromosome in the test population to mimic different SNP density panels. Illumina Bovine3K and Illumina BovineLD (6909 SNPs) information was also used. We found that the three methods showed comparable accuracy when the proportion of masked SNPs was low. However, the difference became larger when more SNPs were masked. BEAGLE performed the best and was most robust with imputation accuracies >90% in almost all situations. fastPHASE was affected by the proportion of masked SNPs, especially when the masked SNP rate was high. findhap ran the fastest, whereas its accuracies were lower than those of BEAGLE but higher than those of fastPHASE. In addition, enlarging the reference population improved the imputation accuracy for BEAGLE and findhap, but did not affect fastPHASE. Considering imputation accuracy and computational requirements, BEAGLE has been found to be more reliable for imputing genotypes from low- to high-density genotyping platforms.  相似文献   

19.
20.
Individual elements of many extinct and extant North American rhinocerotids display osteopathologies, particularly exostoses, abnormal textures, and joint margin porosity, that are commonly associated with localized bone trauma. When we evaluated six extinct rhinocerotid species spanning 50 million years (Ma), we found the incidence of osteopathology increases from 28% of all elements of Eocene Hyrachyus eximius to 65–80% of all elements in more derived species. The only extant species in this study, Diceros bicornis, displayed less osteopathologies (50%) than the more derived extinct taxa. To get a finer-grained picture, we scored each fossil for seven pathological indicators on a scale of 1–4. We estimated the average mass of each taxon using M1-3 length and compared mass to average pathological score for each category. We found that with increasing mass, osteopathology also significantly increases. We then ran a phylogenetically-controlled regression analysis using a time-calibrated phylogeny of our study taxa. Mass estimates were found to significantly covary with abnormal foramen shape and abnormal bone textures. This pattern in osteopathological expression may reflect a part of the complex system of adaptations in the Rhinocerotidae over millions of years, where increased mass, cursoriality, and/or increased life span are selected for, to the detriment of long-term bone health. This work has important implications for the future health of hoofed animals and humans alike.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号