首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The use of high‐throughput, low‐density sequencing approaches has dramatically increased in recent years in studies of eco‐evolutionary processes in wild populations and domestication in commercial aquaculture. Most of these studies focus on identifying panels of SNP loci for a single downstream application, whereas there have been few studies examining the trade‐offs for selecting panels of markers for use in multiple applications. Here, we detail the use of a bioinformatic workflow for the development of a dual‐purpose SNP panel for parentage and population assignment, which included identifying putative SNP loci, filtering for the most informative loci for the two tasks, designing effective multiplex PCR primers, optimizing the SNP panel for performance, and performing quality control steps for downstream applications. We applied this workflow to two adjacent Alaskan Sockeye Salmon populations and identified a GTseq panel of 142 SNP loci for parentage and 35 SNP loci for population assignment. Only 50–75 panel loci were necessary for >95% accurate parentage, whereas population assignment success, with all 172 panel loci, ranged from 93.9% to 96.2%. Finally, we discuss the trade‐offs and complexities of the decision‐making process that drives SNP panel development, optimization, and testing.  相似文献   

2.
Genetic stock identification (GSI) is an important tool in fisheries management. Microsatellites (μSATs) have been the dominant genetic marker for GSI; however, increasing availability and numerous advantages of single-nucleotide polymorphism (SNP) markers make them an appealing alternative. We tested performance of 13 μSAT vs. 92 SNP loci in a fine-scale application of GSI, using a new baseline for Chinook salmon consisting of 49 collections (n = 4014) distributed across the Columbia River Basin. In GSI, baseline genotypes for both marker sets were used independently to analyse a real fishery mixture (n = 2731) representing the total run of Chinook salmon passing Bonneville Dam in the Columbia River. Marker sets were evaluated using three criteria: (i) ability to differentiate reporting groups, (ii) proportion of correct assignment in mixture simulation tests and baseline leave-one-out analyses and (iii) individual assignment and confidence intervals around estimated stock proportions of a real fishery mixture. The μSATs outperformed the SNPs in resolving fine-scale relationships, but all 105 markers combined provided greatest power for GSI. SNPs were ranked by relative information content based on both an iterative procedure that optimized correct assignment to the baseline and ranking by minor allele frequency. For both methods, we identified a subset of the top 50 ranked loci, which were similar in assignment accuracy, and both reached maximum available power of the total 92 SNP loci (correct assignment = 73%). Our estimates indicate that between 100 and 200 highly informative SNP loci are required to meet management standards (correct assignment > 90%) for resolving stocks in finer-scale GSI applications.  相似文献   

3.
The objective of this study was to use simulation to evaluate the benefits of considering haplotypes of loci when linked single nucleotide polymorphisms are used for breed assignment. Three breeds of 10 000 females each were simulated under eight scenarios that differed according to the number of generations separating the breeds, size of breed founder populations and recombination rate between linked loci. Molecular genotypes consisted of 20 groups of three linked loci each. Breed assignment was performed in the final generation and was based on the frequency method. Haplotypes were reconstructed using the expectation–maximization algorithm. Accuracy of breed assignment was based on the frequency of correct breed assignment. Assignment accuracy increased as more genotypes (loci or haplotypes) were considered and more animals were used to estimate genotypic frequencies within breed. For most scenarios, use of haplotypes yielded equal or greater accuracies than when loci were considered independent. The advantage of haplotypes tended to increase as linkage disequilibrium between adjacent loci increased. The greatest advantage for using haplotypes was observed when recombination rate was low (0.001), breeds were separated by few generations (100), and a relatively large number of founder animals (110) was used to form new breeds. In this situation, 90% accuracy of breed assignment was achieved using nine to 14 haplotypes (i.e. 27–42 loci) depending on breed, vs. 39–57 individual loci.  相似文献   

4.
It is well known that statistical classification procedures should be assessed using data that are separate from those used to train the classifier. This principle is commonly overlooked when the classification procedure in question is population assignment using a set of genetic markers that were chosen specifically on the basis of their allele frequencies from amongst a larger number of candidate markers. This oversight leads to a systematic upward bias in the predicted accuracy of the chosen set of markers for population assignment. Three widely used software programs for selecting markers informative for population assignment suffer from this bias. The extent of this bias is documented through a small set of simulations. The relative effect of the bias is largest when screening many candidate loci from poorly differentiated populations. Simple unbiased methods are presented and their use encouraged.  相似文献   

5.
This paper reports 20 new microsatellite loci that are highly polymorphic in rhesus macaques (Macaca mulatta). We screened known human microsatellite loci to identify markers that are polymorphic in rhesus macaques, and then selected specific loci that show substantial levels of heterozygosity and robust, reliable amplification. The 20 loci reported here were chosen to include one highly informative microsatellite from each rhesus monkey autosomal chromosome. Fourteen of the 20 polymorphisms are tetranucleotide repeats, and all can be analyzed using standard PCR and electrophoresis procedures. These new rhesus markers have an average of 15.5 alleles per locus and average heterozygosity of 0.83. This panel of DNA polymorphisms will be useful for a variety of different genetic analyses, including pedigree testing, paternity analysis, and population genetic studies. Many of these loci are also likely to be informative in other closely related Old World monkey species.  相似文献   

6.
Angus and Hereford beef is marketed internationally for apparent superior meat quality attributes; DNA-based breed authenticity could be a useful instrument to ensure consumer confidence on premium meat products. The objective of this study was to develop an ultra-low-density genotype panel to accurately quantify the Angus and Hereford breed proportion in biological samples. Medium-density genotypes (13 306 single nucleotide polymorphisms (SNPs)) were available on 54 703 commercial and 4042 purebred animals. The breed proportion of the commercial animals was generated from the medium-density genotypes and this estimate was regarded as the gold-standard breed composition. Ten genotype panels (100 to 1000 SNPs) were developed from the medium-density genotypes; five methods were used to identify the most informative SNPs and these included the Delta statistic, the fixation (Fst) statistic and an index of both. Breed assignment analyses were undertaken for each breed, panel density and SNP selection method separately with a programme to infer population structure using the entire 13 306 SNP panel (representing the gold-standard measure). Breed assignment was undertaken for all commercial animals (n=54 703), animals deemed to contain some proportion of Angus based on pedigree (n=5740) and animals deemed to contain some proportion of Hereford based on pedigree (n=5187). The predicted breed proportion of all animals from the lower density panels was then compared with the gold-standard breed prediction. Panel density, SNP selection method and breed all had a significant effect on the correlation of predicted and actual breed proportion. Regardless of breed, the Index method of SNP selection numerically (but not significantly) outperformed all other selection methods in accuracy (i.e. correlation and root mean square of prediction) when panel density was ⩾300 SNPs. The correlation between actual and predicted breed proportion increased as panel density increased. Using 300 SNPs (selected using the global index method), the correlation between predicted and actual breed proportion was 0.993 and 0.995 in the Angus and Hereford validation populations, respectively. When SNP panels optimised for breed prediction in one population were used to predict the breed proportion of a separate population, the correlation between predicted and actual breed proportion was 0.034 and 0.044 weaker in the Hereford and Angus populations, respectively (using the 300 SNP panel). It is necessary to include at least 300 to 400 SNPs (per breed) on genotype panels to accurately predict breed proportion from biological samples.  相似文献   

7.
《Genomics》2020,112(2):1726-1733
The cost of SNP genotyping to screen different breeds and to estimate the exact proportion of ancestry level is quite high, which can be compensated through deriving a small panel of ancestry informative markers (AIMs). Hence, we carried out the present study to provide an insight into ancestry level inferred from a panel of informative markers in the crossbred Vrindavani population developed at ICAR-IVRI, India. We have performed a new method i.e., discriminant analysis of principal components (DAPC) for the first time on the dataset of Vrindavani cattle. To confirm our method, we had performed DAPC on two other well-known crossbred cattle, i.e., Frieswal and Beefmaster. Three sets of panels (500, 1000 and 2000 markers) were tested for clustering of individuals. Among all the panels, we found the panel (1000 markers) with DAPC based contribution method was of the smallest size and comparatively of the highest accuracy.  相似文献   

8.
To preserve genetic variability and minimize genetic subdivision among captive Macaca mulatta at each of the U.S. National Institutes of Health (NIH)-sponsored regional research colonies, the genetic structure of each colony must be characterized. To compare population genetic and demographic parameters across colonies and generations, one standard panel of highly informative genetic markers is required. We assembled a core marker set of four multiplex polymerase chain reaction (PCR) panels comprising 15 autosomal short tandem repeat (STR) loci with high information content selected from existing panels of well-characterized markers that are currently used for parentage assessment and genetic management of rhesus macaques. We then assessed the effectiveness of these loci for providing high probabilities of individual identification and parentage resolution, and for estimating population genetic parameters that are useful for genetic management.  相似文献   

9.
Genetic linkage analyses with genotypic data obtained from four CEPH reference families initially assigned 24 new PCR-based markers to chromosome 17 and located the markers at specific intervals of an existing genetic map of chromosome 17p. Each marker was additionally genotyped with an ordered set of obligate, phase-known recombinant chromosomes. The breakpoint-mapping panels for each family consisted of two parents, one sib with a nonrecombinant chromosome, and one or more sibs with obligate recombinant chromosomes. The relative order of markers was determined by sorting segregation patterns of new markers and ordered anchor markers and by minimizing double-recombination events. Consistency of segregation patterns with multiple flanking loci constituted support for order. A genetic map of chromosome 17p was completed with 39 markers in 23 clusters, with an average space of 3 cM between clusters. The collection of informative genotypes was highly efficient, requiring fivefold fewer genotypes than would be collected with all the CEPH families. Given the availability of large numbers of highly informative PCR-based markers, meiotic breakpoint mapping should facilitate construction of a human genomic map with 1-cM resolution.  相似文献   

10.
This report describes a set of 23 informative SNPs (BARCSoySNP23) distributed on 19 of the 20 soybean linkage groups that can be used for soybean cultivar identification. Selection of the SNPs to include in this set was made based upon the information provided by each SNP for distinguishing a diverse set of soybean genotypes as well as the linkage map position of each SNP. The genotypes included the ancestors of North American cultivars, modern North American cultivars and a group of Korean cultivars. The procedure used to identify this subset of highly informative SNP markers resulted in a significant increase in the power of identification versus any other randomly selected set of equal number. This conclusion was supported by a simulation which indicated that the 23-SNP panel can uniquely distinguish 2,200 soybean cultivars, whereas sets of randomly selected 23-SNP panels allowed the unique identification of only about 50 cultivars. The 23-SNP panel can efficiently distinguish each of the genotypes within four maturity group sets of additional cultivars/lines that have identical classical pigmentation and morphological traits. Comparatively, the 13 trinucleotide SSR set published earlier (BARCSoySSR13) has more power on a per locus basis because of the multi-allelic nature of SSRs. However, the assay of bi-allelic SNP loci can be multi-plexed using non-gel based techniques allowing for rapid determination of the SNP alleles present in soybean genotypes, thereby compensating for their relatively low information content. Both BARCSoySNP23 and BARCSoySSR13 were highly congruent relative to identifying genotypes and for estimating population genetic differences.  相似文献   

11.
A strategy for using multiple linked markers for genetic counseling.   总被引:12,自引:6,他引:6  
A strategy for using multiple linked markers for genetic counseling is to test sequentially individual markers until a diagnosis can be made. We show that in order to minimize the number of tests performed per case while diagnosing all informative cases the order in which the markers are to be tested is critical. We describe an algorithm to obtain this order using the parameter "I," the frequency of informative cases. The I value for a specific locus used depends on the marker frequency, association with the disease locus, and also on the informativeness of the marker loci already tested. Realizing that a direct assay for the beta S gene already exists, and that most cases of beta-thalassemia in Mediterraneans can be directly diagnosed using synthetic oligonucleotide probes, we illustrate the above technique by examining nine DNA polymorphisms in the human beta-globin cluster for their ability to diagnose sickle-cell anemia in American blacks and beta-thalassemia in Mediterraneans. This analysis shows that 95.39% of all sickle-cell pregnancies can be diagnosed by testing a subset of only six markers chosen by our algorithm. Furthermore, six markers can also diagnose 88.03% of beta-thalassemia in Greeks and 83.56% of beta-thalassemia in Italians. The test set is different from that suggested by the individual informative frequencies due to nonrandom associations between the restriction sites.  相似文献   

12.
Determining how many and which codominant marker loci are required for accurate parentage assignment is not straightforward because levels of marker polymorphism, linkage, allelic distributions among potential parents and other factors produce differences in the discriminatory power of individual markers and sets of markers. p-loci software identifies the most efficient set of codominant markers for assigning parentage at a user-defined level of success, using either simulated or actual offspring genotypes of known parentage. Simulations can incorporate linkage among markers, mating design and frequencies of null alleles and/or genotyping errors. p-loci is available for windows systems at http://marineresearch.oregonstate.edu/genetics/ploci.htm.  相似文献   

13.
We examined the distribution of alleles at 63 microsatellite loci distributed across 29 linkage groups in broodstock females from a commercial population of rainbow trout spawning on different dates throughout the season (August to January). A total of 368 females, 184 and 117 females from each of the tail-ends of the spawning distribution and a subsample of 67 females spawning in the middle, were used to detect marker–trait associations. Twenty-one loci in a subset of genomic regions (RT-5, 7, 8, 10, 12, 14, 15, 22, 23, 24, 25, 29, 30, and 31) were significantly associated with variation in spawning date. Many of these markers localize to regions with known spawning date quantitative trait loci based on previous studies. An individual assignment analysis was used to test how well the molecular data could be used to assign individuals to their correct spawning group, and markers were given a ranking reflecting their contribution to the accuracy of assignment. The top 15 ranked markers were successful at assigning the majority of females to the correct spawning group based on genotype with an average accuracy of 76 %. The most likely genes that could contribute to these differences in spawning date are discussed. Together, these data indicate that the loci could be incorporated into a selection index with phenotype data to increase the accuracy of selection for spawning date.  相似文献   

14.
DNA marker technology represents a promising means for determining the genetic identity and kinship of an animal. Compared with other types of DNA markers, single nucleotide polymorphisms (SNPs) are attractive because they are abundant, genetically stable, and amenable to high-throughput automated analysis. In cattle, the challenge has been to identify a minimal set of SNPs with sufficient power for use in a variety of popular breeds and crossbred populations. This report describes a set of 32 highly informative SNP markers distributed among 18 autosomes and both sex chromosomes. Informativity of these SNPs in U.S. beef cattle populations was estimated from the distribution of allele and genotype frequencies in two panels: one consisting of 96 purebred sires representing 17 popular breeds, and another with 154 purebred American Angus from six herds in four Midwestern states. Based on frequency data from these panels, the estimated probability that two randomly selected, unrelated individuals will possess identical genotypes for all 32 loci was 2.0 × 10−13 for multi-breed composite populations and 1.9 × 10−10 for purebred Angus populations. The probability that a randomly chosen candidate sire will be excluded from paternity was estimated to be 99.9% and 99.4% for the same respective populations. The DNA immediately surrounding the 32 target SNPs was sequenced in the 96 sires of the multi-breed panel and found to contain an additional 183 polymorphic sites. Knowledge of these additional sites, together with the 32 target SNPs, allows the design of robust, accurate genotype assays on a variety of high-throughput SNP genotyping platforms.  相似文献   

15.
Genotype-Imputation Accuracy across Worldwide Human Populations   总被引:2,自引:0,他引:2  
A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the “portability” of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified “optimal” mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial “SNP chip,” again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.  相似文献   

16.

Background

We explored the imputation performance of the program IMPUTE in an admixed sample from Mexico City. The following issues were evaluated: (a) the impact of different reference panels (HapMap vs. 1000 Genomes) on imputation; (b) potential differences in imputation performance between single-step vs. two-step (phasing and imputation) approaches; (c) the effect of different INFO score thresholds on imputation performance and (d) imputation performance in common vs. rare markers.

Methods

The sample from Mexico City comprised 1,310 individuals genotyped with the Affymetrix 5.0 array. We randomly masked 5% of the markers directly genotyped on chromosome 12 (n?=?1,046) and compared the imputed genotypes with the microarray genotype calls. Imputation was carried out with the program IMPUTE. The concordance rates between the imputed and observed genotypes were used as a measure of imputation accuracy and the proportion of non-missing genotypes as a measure of imputation efficacy.

Results

The single-step imputation approach produced slightly higher concordance rates than the two-step strategy (99.1% vs. 98.4% when using the HapMap phase II combined panel), but at the expense of a lower proportion of non-missing genotypes (85.5% vs. 90.1%). The 1,000 Genomes reference sample produced similar concordance rates to the HapMap phase II panel (98.4% for both datasets, using the two-step strategy). However, the 1000 Genomes reference sample increased substantially the proportion of non-missing genotypes (94.7% vs. 90.1%). Rare variants (<1%) had lower imputation accuracy and efficacy than common markers.

Conclusions

The program IMPUTE had an excellent imputation performance for common alleles in an admixed sample from Mexico City, which has primarily Native American (62%) and European (33%) contributions. Genotype concordances were higher than 98.4% using all the imputation strategies, in spite of the fact that no Native American samples are present in the HapMap and 1000 Genomes reference panels. The best balance of imputation accuracy and efficiency was obtained with the 1,000 Genomes panel. Rare variants were not captured effectively by any of the available panels, emphasizing the need to be cautious in the interpretation of association results for imputed rare variants.  相似文献   

17.
Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, Fst and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.  相似文献   

18.
Single nucleotide polymorphisms (SNPs) able to describe population differences can be used for important applications in livestock, including breed assignment of individual animals, authentication of mono-breed products and parentage verification among several other applications. To identify the most discriminating SNPs among thousands of markers in the available commercial SNP chip tools, several methods have been used. Random forest (RF) is a machine learning technique that has been proposed for this purpose. In this study, we used RF to analyse PorcineSNP60 BeadChip array genotyping data obtained from a total of 2737 pigs of 7 Italian pig breeds (3 cosmopolitan-derived breeds: Italian Large White, Italian Duroc and Italian Landrace, and 4 autochthonous breeds: Apulo-Calabrese, Casertana, Cinta Senese and Nero Siciliano) to identify breed informative and reduced SNP panels using the mean decrease in the Gini Index and the Mean Decrease in Accuracy parameters with stability evaluation. Other reduced informative SNP panels were obtained using Delta, Fixation index and principal component analysis statistics, and their performances were compared with those obtained using the RF-defined panels using the RF classification method and its derived Out Of Bag rates and correct prediction proportions. Therefore, the performances of a total of six reduced panels were evaluated. The correct assignment of the animals to its breed was close to 100% for all tested approaches. Porcine chromosome 8 harboured the largest number of selected SNPs across all panels. Many SNPs were included in genomic regions in which previous studies identified signatures of selection or genes (e.g. ESR1, KITL and LCORL) that could contribute to explain, at least in part, phenotypically or economically relevant traits that might differentiate cosmopolitan and autochthonous pig breeds. Random forest used as preselection statistics highlighted informative SNPs that were not the same as those identified by other methods. This might be due to specific features of this machine learning methodology. It will be interesting to explore if the adaptation of RF methods for the identification of selection signature regions could be able to describe population-specific features that are not captured by other approaches.  相似文献   

19.
We established two mouse interspecific backcross DNA panels, one containing 94 N2 animals from the cross (C57BL/6J × Mus spretus)F1 × C57BL/6J, and another from 94 N2 animals from the reciprocal backcross (C57BL/6J × SPRET/Ei)F1 × SPRET/Ei. We prepared large quantities of DNA from most tissues of each animal to create a community resource of interspecific backcross DNA for use by laboratories interested in mapping loci in the mouse. Initial characterization of the genetic maps of both panels has been completed. We used MIT SSLP markers, proviral loci, and several other sequence-defined genes to anchor our maps to other published maps. The BSB panel map (from the backcross to C57BL/6J) contains 215 loci and is anchored by 45 SSLP and 32 gene sequence loci. The BSS panel map (from the backcross to SPRET/Ei) contains 451 loci and is anchored by 49 SSLP loci, 43 proviral loci, and 60 gene sequence loci. To obtain a high density of markers, we used motif-primed PCR to fingerprint the panel DNAs. We constructed two maps, each representing one of the two panels. All new loci can be located with a high degree of certainty on the maps at current marker density. Segregation patterns in these data reveal several examples of transmission ratio distortion and permit analysis of the distribution of crossovers on individual chromosomes.  相似文献   

20.
The development of accurate clinical biomarkers has been challenging in part due to the diversity between patients and diseases. One approach to account for the diversity is to use multiple markers to classify patients, based on the concept that each individual marker contributes information from its respective subclass of patients. Here we present a new strategy for developing biomarker panels that accounts for completely distinct patient subclasses. Marker State Space (MSS) defines “marker states” based on all possible patterns of high and low values among a panel of markers. Each marker state is defined as either a case state or a control state, and a sample is classified as case or control based on the state it occupies. MSS was used to define multi-marker panels that were robust in cross validation and training-set/test-set analyses and that yielded similar classification accuracy to several other classification algorithms. A three-marker panel for discriminating pancreatic cancer patients from control subjects revealed subclasses of patients based on distinct marker states. MSS provides a straightforward approach for modeling highly divergent subclasses of patients, which may be adaptable for diverse applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号