首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Pedigrees, depicting genealogical relationships between individuals, are important in several research areas. Molecular markers allow inference of pedigrees in wild species where relationship information is impossible to collect by observation. Marker data are analysed statistically using methods based on Mendelian inheritance rules. There are numerous computer programs available to conduct pedigree analysis, but most software is inflexible, both in terms of assumptions and data requirements. Most methods only accommodate monogamous diploid species using codominant markers without genotyping error. In addition, most commonly used methods use pairwise comparisons rather than a full-pedigree likelihood approach, which considers the likelihood of the entire pedigree structure and allows the simultaneous inference of parentage and sibship. Here, we describe colony, a computer program implementing full-pedigree likelihood methods to simultaneously infer sibship and parentage among individuals using multilocus genotype data. colony can be used for both diploid and haplodiploid species; it can use dominant and codominant markers, and can accommodate, and estimate, genotyping error at each locus. In addition, colony can carry out these inferences for both monoecious and dioecious species. The program is available as a Microsoft Windows version, which includes a graphical user interface, and a Macintosh version, which uses an R-based interface.  相似文献   

2.
Many plants and some animal species are polyploids. Nondisomically inherited markers (e.g. microsatellites) in such species cannot be analysed directly by standard population genetics methods developed for diploid species. One solution is to transform the polyploid codominant genotypes to pseudodiploid‐dominant genotypes, which can then be analysed by standard methods for various purposes such as spatial genetic structure, individual relatedness and relationship. Although this data transformation approach has been used repeatedly in the literature, no systematic study has been conducted to investigate how efficient it is, how much marker information is lost and thus how much analysis accuracy is reduced. More specifically, it is unknown whether or not the transformed data can be used to infer parentage and sibship jointly, and how different sampling schemes (number and polymorphism of markers, number of individuals) and ploidy level affect the inference accuracy. This study analyses both simulated and empirical data to examine the effects of polyploid levels, actual pedigree structures and marker number and polymorphism on the accuracy of joint parentage and sibship assignments in polyploid species. We show that sibship, parentage and selfing rates in polyploids can be inferred accurately from a typical set of microsatellite loci. We also show that inferences can be substantially improved by allowing for a small genotyping error rate to accommodate the distortion in assumed Mendelian inheritance of the converted markers when large sibship groups are involved. The results are discussed in the context of polyploid data analysis in molecular ecology.  相似文献   

3.
famoz (an acronym for father/mother) is a software useful in reconstructing parentage for dominant, codominant and uniparentally inherited markers. It is written in C and TclTk languages and is available for Unix, Linux and Windows systems at http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/index.html . Parameters and assumptions used in the calculations are few and simple. Exclusion and identity probabilities, log‐likelihoods of any genetic relationship, potential father and parent or parent pair, half‐ and full‐sibship are calculated based on real or simulated data. Error rates for genotypic mistyping can be introduced. Simulations can be done to build statistical tests for parentage assignment.  相似文献   

4.
J Wang 《Heredity》2013,111(2):165-174
Many methods have been proposed to reconstruct the pedigree of a sample of individuals from their multilocus marker genotypes. These methods, like those in other fields of statistical inferences, may suffer from both type I (falsely related) and type II (falsely unrelated) errors. In sibship reconstruction, type I errors come from the spurious fusion of two or more small sibships into a single sibship, and type II errors originate from the spurious splitting of a large sibship into two or more small sibships. In this study I investigate the tendencies of both types of errors made by the likelihood methods in sibship reconstruction, using both analytical and simulation approaches. I propose an improvement on the likelihood methods to reduce sibship splitting, and thus type II errors by downscaling the number of inferred siblings sharing the same genotype at a locus. Simulations are then conducted to compare the accuracy of the original and improved likelihood methods in sibship reconstruction of a large sample of individuals in full-sib families of the same small size, the same large size and highly variable sizes, using a variable number of loci with a variable number of alleles per locus. The methods were also applied to the analysis of a salmon data set. I show that my scaling scheme prevents effectively the splitting of large sibships, and reduces type II errors greatly with little increase in type I errors. As a result, it improves the overall accuracy of sibship assignments, except when sibships are expected to be uniformly small or marker information is unrealistically scarce.  相似文献   

5.
Liu PY  Lu Y  Deng HW 《Genetics》2006,174(1):499-509
Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis.  相似文献   

6.
Captive breeding programs are an important tool for the conservation of endangered species. These programs are commonly managed using pedigrees containing information about the history of each individual's family, such as breeding pairs and parentage. However, there are some species that are kept in groups where it is hard to distinguish between particular individuals within the group, making it very difficult to record any information at an individual level. Currently, software and methods commonly used for registering and analyzing pedigrees to help manage populations at an individual level are not adequate for managing these group‐living species. Therefore, there is a need to further develop these tools and methodologies for pedigree analysis to better manage group‐living species. PMx is a program used for the management of ex situ populations in zoos and aquariums. We adapted the pedigree analysis method implemented in PMx to analyze pedigrees (records of descendant lineages) of group‐living species. In addition, we developed a group pedigree data entry sheet and group2PMx, a converter program that enables group datasets to be imported into PMx. We show how pedigree analysis of a group‐living species can be used for population management using the studbook of the endangered Texas blind cave salamander Eurycea rathbuni. Such analyses of the pedigree of groups can improve the management of group‐living species in ex situ breeding programs. Firstly, it enables better management decisions based on more accurate genetic measures between groups, allowing for greater control of inbreeding. Secondly, it can improve the conditions in which group‐living species are held by adapting husbandry practices to better reflect conditions of these species living in the wild. The use of the spreadsheet and group2PMx extends the application of PMx, allowing conservation managers and other institutions outside the zoo and aquarium community to easily import and analyze their pedigree data.  相似文献   

7.
Kinship plays a fundamental role in the evolution of social systems and is considered a key driver of group living. To understand the role of kinship in the formation and maintenance of social bonds, accurate measures of genetic relatedness are critical. Genotype‐by‐sequencing technologies are rapidly advancing the accuracy and precision of genetic relatedness estimates for wild populations. The ability to assign kinship from genetic data varies depending on a species’ or population's mating system and pattern of dispersal, and empirical data from longitudinal studies are crucial to validate these methods. We use data from a long‐term behavioural study of a polygynandrous, bisexually philopatric marine mammal to measure accuracy and precision of parentage and genetic relatedness estimation against a known partial pedigree. We show that with moderate but obtainable sample sizes of approximately 4,235 SNPs and 272 individuals, highly accurate parentage assignments and genetic relatedness coefficients can be obtained. Additionally, we subsample our data to quantify how data availability affects relatedness estimation and kinship assignment. Lastly, we conduct a social network analysis to investigate the extent to which accuracy and precision of relatedness estimation improve statistical power to detect an effect of relatedness on social structure. Our results provide practical guidance for minimum sample sizes and sequencing depth for future studies, as well as thresholds for post hoc interpretation of previous analyses.  相似文献   

8.

Background

Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.

Methods

A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.

Results

The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.

Conclusions

The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.  相似文献   

9.
Wang J 《Heredity》2007,99(2):205-217
Parentage exclusion probabilities are now routinely calculated in genetic marker-assisted parentage analyses to indicate the statistical power of the analyses achievable for a given set of markers, and to measure the informativeness of a set of markers for parentage inference. Previous formulas invariably assume that parentage is to be sought for a single offspring, while in practice multiple full siblings might be sampled (for example, seeds, eggs or young from a pair of monogamous parents) and their father, mother or both are to be assigned among a number of candidates. In this study, I derive formulas for parentage exclusion probabilities for an arbitrary number (n) of fullsibs, which reduce to previous equations for the special case of n=1. I also derive sibship exclusion probabilities, and investigate the power of differentiating half-sib, avuncular and grandparent-grandoffspring relationships using unlinked autosomal markers among different numbers of tested individuals. Applications of the formulas are demonstrated using both theoretical and empirical data sets of allele frequencies. The results from the study highlight the conclusion that the power of genealogical relationship inferences can be enhanced enormously by analysing multiple individuals for a given set of markers. The equations derived in this study allow more accurate determination of marker information and of the power of a parentage/sibship analysis. In addition, they can be used to guide experimental designs of parentage analyses in selecting markers and determining the number of offspring to be sampled and genotyped.  相似文献   

10.
create is a Windows program for the creation of new and conversion of existing data input files for 52 genetic data analysis software programs. Programs are grouped into areas of sibship reconstruction, parentage assignment, genetic data analysis, and specialized applications. create is able to read in data from text, Microsoft Excel and Access sources and allows the user to specify columns containing individual and population identifiers, birth and death data, sex data, relationship information, and spatial location data. create's only constraints on source data are that one individual is contained in one row, and the genotypic data is contiguous. create is available for download at http://www.lsc.usgs.gov/CAFL/Ecology/Software.html.  相似文献   

11.
Knowledge of the parentage of individuals is required to address a variety of questions concerning the evolutionary dynamics of wild populations. A major advance in parentage inference in natural populations has been the use of molecular markers and the development of statistical methods to analyse these data. Cervus, one of the most widely used parentage inference programs, uses molecular data to determine parent–offspring relationships. However, Cervus does not make use of all available information: additional phenotypic information may exist predicting parent–offspring relationships, and additional genetic information may be exploited by simultaneously considering multiple types of relationships rather than just pairwise or just parent–offspring relationships. Here we reanalyse data from a wild red deer population using two programs capable of using this additional information, MasterBayes and COLONY2, and quantify the impact of these alternative approaches by comparison with a ‘known pedigree’ estimated using a larger suite of microsatellite makers for a subset of the population. The use of phenotypic information and multiple relationships increased the number of correct assignments. We highlight the differences between programs, particularly the use of population‐ rather than individual‐level statistical confidence in Cervus. We conclude that the use of additional information allows MasterBayes and COLONY2 to assign more correct paternities, whereas their use of individual‐ rather than population‐level confidence generates fewer erroneous assignments. We suggest that maximal information may be gained by combining outputs from different programs. Higher accuracy and completeness of pedigree information will improve parameters estimated from pedigree information in studies of natural populations.  相似文献   

12.
For wildlife populations, it is often difficult to determine biological parameters that indicate breeding patterns and population mixing, but knowledge of these parameters is essential for effective management. A pedigree encodes the relationship between individuals and can provide insight into the dynamics of a population over its recent history. Here, we present a method for the reconstruction of pedigrees for wild populations of animals that live long enough to breed multiple times over their lifetime and that have complex or unknown generational structures. Reconstruction was based on microsatellite genotype data along with ancillary biological information: sex and observed body size class as an indicator of relative age of individuals within the population. Using body size‐class data to infer relative age has not been considered previously in wildlife genealogy and provides a marked improvement in accuracy of pedigree reconstruction. Body size‐class data are particularly useful for wild populations because it is much easier to collect noninvasively than absolute age data. This new pedigree reconstruction system, PR‐genie, performs reconstruction using maximum likelihood with optimization driven by the cross‐entropy method. We demonstrated pedigree reconstruction performance on simulated populations (comparing reconstructed pedigrees to known true pedigrees) over a wide range of population parameters and under assortative and intergenerational mating schema. Reconstruction accuracy increased with the presence of size‐class data and as the amount and quality of genetic data increased. We provide recommendations as to the amount and quality of data necessary to provide insight into detailed familial relationships in a wildlife population using this pedigree reconstruction technique.  相似文献   

13.
Paterson T  Law A 《Animal genetics》2011,42(5):560-562
Datapoint errors in pedigree genotype data sets are difficult to identify and adversely affect downstream genetic analyses. We present GenotypeChecker, a desktop software tool for assisting data cleansing. The application identifies likely data errors in pedigree/genotype data sets by performing an inheritance-checking algorithm for each marker across the pedigree, and highlights inconsistently inherited genotypes in an exploratory user interface. By 'masking' suspect datapoints and rechecking inheritance consistency, erroneous datapoints can be confirmed and cleansed from the data set. The software, examples and documentation are freely available at http://bioinformatics.roslin.ac.uk/genotypechecker.  相似文献   

14.
With the exponential increase in genotyping capability, it is fundamental to check data consistency and improve genotype management. Atlas is a Java-based application for managing genotypes that also provides a series of tools useful in traceability, parentage testing, and identification, as well as pedigree and marker visualization.  相似文献   

15.
Abney M 《Genetics》2008,179(3):1577-1590
Computing identity-by-descent sharing between individuals connected through a large, complex pedigree is a computationally demanding task that often cannot be done using exact methods. What I present here is a rapid computational method for estimating, in large complex pedigrees, the probability that pairs of alleles are IBD given the single-point genotype data at that marker for all individuals. The method can be used on pedigrees of essentially arbitrary size and complexity without the need to divide the individuals into separate subpedigrees. I apply the method to do qualitative trait linkage mapping using the nonparametric sharing statistic S(pairs). The validity of the method is demonstrated via simulation studies on a 13-generation 3028-person pedigree with 700 genotyped individuals. An analysis of an asthma data set of individuals in this pedigree finds four loci with P-values <10(-3) that were not detected in prior analyses. The mapping method is fast and can complete analyses of approximately 150 affected individuals within this pedigree for thousands of markers in a matter of hours.  相似文献   

16.
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low‐density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome‐wide association analysis and genomic prediction in Labrador Retrievers.  相似文献   

17.
J. Wang  A. W. Santure 《Genetics》2009,181(4):1579-1594
Likelihood methods have been developed to partition individuals in a sample into sibling clusters using genetic marker data without parental information. Most of these methods assume either both sexes are monogamous to infer full sibships only or only one sex is polygamous to infer full sibships and paternal or maternal (but not both) half sibships. We extend our previous method to the more general case of both sexes being polygamous to infer full sibships, paternal half sibships, and maternal half sibships and to the case of a two-generation sample of individuals to infer parentage jointly with sibships. The extension not only expands enormously the scope of application of the method, but also increases its statistical power. The method is implemented for both diploid and haplodiploid species and for codominant and dominant markers, with mutations and genotyping errors accommodated. The performance and robustness of the method are evaluated by analyzing both simulated and empirical data sets. Our method is shown to be much more powerful than pairwise methods in both parentage and sibship assignments because of the more efficient use of marker information. It is little affected by inbreeding in parents and is moderately robust to nonrandom mating and linkage of markers. We also show that individually much less informative markers, such as SNPs or AFLPs, can reach the same power for parentage and sibship inferences as the highly informative marker simple sequence repeats (SSRs), as long as a sufficient number of loci are employed in the analysis.  相似文献   

18.
Low reproductive productivity of young red deer (Cervus elaphus) hinds on New Zealand deer farms appears to reflect high incidences of puberty failure at 16 months of age. This is despite the general attainment of average liveweights 15–25 kg in excess of the accepted minimum threshold for puberty in subspecies of western European origin (scoticus, elaphus and hippelaphus) that form the basis of the national herd. The present study tests the hypotheses that introgression of the larger North American wapiti subspecies (nelsoni, manitobensis and roosevelti) into breeding herds (1) can be assessed from morphological features of individuals, (2) that there is a relationship between the level of wapiti parentage and non-pregnancy rate at 18 months of age (a proxy for puberty failure) and (3) that minimum liveweight thresholds for puberty increase with increasing levels of wapiti parentage.

A total of 4329 18-month-old hinds across four “red” deer farms in southern New Zealand were scanned for pregnancy status. Each hind was assigned a wapiti score (WS) as a subjective assessment of the obviousness of wapiti features. Various body measurements were additionally recorded for each hind. A hair sample was collected for DNA analysis (14 markers) to objectively assign subspecies pedigree (i.e. “Elkmeter”) on a subset of 1258 individuals. A total of 506 (11.7%) hinds were not pregnant at 18 months of age with rates varying between 4.1 and 37.3% between farms and years. Mean WS differed significantly between farms and reflected the genetic management policy of each farm. WS was positively correlated to Elkmeter for each farm/year (<0.05) although regression slopes varied significantly. WS was able to be adjusted for these differences to assign a corrected WS (CWS) for all 4329 individuals that estimated the proportion wapiti parentage. Discriminant analysis of morphological variables relative to Elkmeter supported the first hypothesis and showed that shoulder height and body length were good indicators of the degree of wapiti parentage within individuals. This enabled the development of an objective estimate of wapiti parentage (EWP). The actual level of such parentage within herds ranged from <5 to >55%. There was a significant negative association between wapiti parentage and pregnancy, which was strongly influenced by liveweight, supporting the second and third hypotheses. This was manifest as marked displacement of pregnancy probability curves in relation to liveweight between genotype groups, particularly for those groups with >20% wapiti parentage. For example, predicted threshold liveweights required to achieve a 90% pregnancy rate for EWP values that represent 0, 10, 20, 30, 40 and 50% wapiti parentage were 81, 81, 85, 106, 127 and 137 kg, respectively. Within the study herds, the majority of hinds of 0–20% wapiti parentage exceeded the predicted 90% threshold liveweights for their genotype cohort. However, hinds with higher levels of wapiti parentage generally fell below the predicted threshold for their genotype group. The data strongly suggest that under liveweight performance levels measured for red deer, hinds with >20% wapiti parentage are at high risk of puberty failure.  相似文献   


19.
Parentage studies and family reconstructions have become increasingly popular for investigating a range of evolutionary, ecological and behavioural processes in natural populations. However, a number of different assignment methods have emerged in common use and the accuracy of each may differ in relation to the number of loci examined, allelic diversity, incomplete sampling of all candidate parents and the presence of genotyping errors. Here, we examine how these factors affect the accuracy of three popular parentage inference methods (colony , famoz and an exclusion‐Bayes’ theorem approach by Christie (Molecular Ecology Resources, 2010a, 10, 115) to resolve true parent–offspring pairs using simulated data. Our findings demonstrate that accuracy increases with the number and diversity of loci. These were clearly the most important factors in obtaining accurate assignments explaining 75–90% of variance in overall accuracy across 60 simulated scenarios. Furthermore, the proportion of candidate parents sampled had a small but significant impact on the susceptibility of each method to either false‐positive or false‐negative assignments. Within the range of values simulated, colony outperformed FaMoz, which outperformed the exclusion‐Bayes’ theorem method. However, with 20 or more highly polymorphic loci, all methods could be applied with confidence. Our results show that for parentage inference in natural populations, careful consideration of the number and quality of markers will increase the accuracy of assignments and mitigate the effects of incomplete sampling of parental populations.  相似文献   

20.
In the context of parentage assignment using genomic markers, key issues are genotyping errors and an absence of parent genotypes because of sampling, traceability or genotyping problems. Most likelihood‐based parentage assignment software programs require a priori estimates of genotyping errors and the proportion of missing parents to set up meaningful assignment decision rules. We present here the R package APIS, which can assign offspring to their parents without any prior information other than the offspring and parental genotypes, and a user‐defined, acceptable error rate among assigned offspring. Assignment decision rules use the distributions of average Mendelian transmission probabilities, which enable estimates of the proportion of offspring with missing parental genotypes. APIS has been compared to other software (CERVUS, VITASSIGN), on a real European seabass (Dicentrarchus labrax) single nucleotide polymorphism data set. The type I error rate (false positives) was lower with APIS than with other software, especially when parental genotypes were missing, but the true positive rate was also lower, except when the theoretical exclusion power reached 0.99999. In general, APIS provided assignments that satisfied the user‐set acceptable error rate of 1% or 5%, even when tested on simulated data with high genotyping error rates (1% or 3%) and up to 50% missing sires. Because it uses the observed distribution of Mendelian transmission probabilities, APIS is best suited to assigning parentage when numerous offspring (>200) are genotyped. We have demonstrated that APIS is an easy‐to‐use and reliable software for parentage assignment, even when up to 50% of sires are missing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号