首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A pedigree is a directed graph that displays the relationship between individuals according to their parentage. We derive a combinatorial result that shows how any pedigree-up to individuals who have no extant (present-day) ancestors-can be reconstructed from (sex-labelled) pedigrees that describe the ancestry of single extant individuals and pairs of extant individuals. Furthermore, this reconstruction can be done in polynomial time. We also provide an example to show that the corresponding reconstruction result does not hold for pedigrees that are not sex-labelled. We then show how any pedigree can also be reconstructed from two functions that just describe certain circuits in the pedigree. Finally, we obtain an enumeration result for pedigrees that is relevant to the question of how many segregating sites are needed to reconstruct pedigrees.  相似文献   

2.
In this work we develop a novel algorithm for reconstructing the genomes of ancestral individuals, given genotype or sequence data from contemporary individuals and an extended pedigree of family relationships. A pedigree with complete genomes for every individual enables the study of allele frequency dynamics and haplotype diversity across generations, including deviations from neutrality such as transmission distortion. When studying heritable diseases, ancestral haplotypes can be used to augment genome-wide association studies and track disease inheritance patterns. The building blocks of our reconstruction algorithm are segments of Identity-By-Descent (IBD) shared between two or more genotyped individuals. The method alternates between identifying a source for each IBD segment and assembling IBD segments placed within each ancestral individual. Unlike previous approaches, our method is able to accommodate complex pedigree structures with hundreds of individuals genotyped at millions of SNPs.We apply our method to an Old Order Amish pedigree from Lancaster, Pennsylvania, whose founders came to North America from Europe during the early 18th century. The pedigree includes 1338 individuals from the past 12 generations, 394 with genotype data. The motivation for reconstruction is to understand the genetic basis of diseases segregating in the family through tracking haplotype transmission over time. Using our algorithm thread, we are able to reconstruct an average of 224 ancestral individuals per chromosome. For these ancestral individuals, on average we reconstruct 79% of their haplotypes. We also identify a region on chromosome 16 that is difficult to reconstruct—we find that this region harbors a short Amish-specific copy number variation and the gene HYDIN. thread was developed for endogamous populations, but can be applied to any extensive pedigree with the recent generations genotyped. We anticipate that this type of practical ancestral reconstruction will become more common and necessary to understand rare and complex heritable diseases in extended families.  相似文献   

3.
Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees—Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.  相似文献   

4.
For wildlife populations, it is often difficult to determine biological parameters that indicate breeding patterns and population mixing, but knowledge of these parameters is essential for effective management. A pedigree encodes the relationship between individuals and can provide insight into the dynamics of a population over its recent history. Here, we present a method for the reconstruction of pedigrees for wild populations of animals that live long enough to breed multiple times over their lifetime and that have complex or unknown generational structures. Reconstruction was based on microsatellite genotype data along with ancillary biological information: sex and observed body size class as an indicator of relative age of individuals within the population. Using body size‐class data to infer relative age has not been considered previously in wildlife genealogy and provides a marked improvement in accuracy of pedigree reconstruction. Body size‐class data are particularly useful for wild populations because it is much easier to collect noninvasively than absolute age data. This new pedigree reconstruction system, PR‐genie, performs reconstruction using maximum likelihood with optimization driven by the cross‐entropy method. We demonstrated pedigree reconstruction performance on simulated populations (comparing reconstructed pedigrees to known true pedigrees) over a wide range of population parameters and under assortative and intergenerational mating schema. Reconstruction accuracy increased with the presence of size‐class data and as the amount and quality of genetic data increased. We provide recommendations as to the amount and quality of data necessary to provide insight into detailed familial relationships in a wildlife population using this pedigree reconstruction technique.  相似文献   

5.
The DNA at human centromeric regions was characterized by using a repetitive sequence, 308, which localizes in situ exclusively to centromeres of all chromosomes. We previously noted that this sequence is enriched on chromosome 6 and has chromosome-specific organization on 6, 3, 7, 14, X, and Y. In addition to this basic organization, sequences homologous to 308 are polymorphic among normal individuals. The variants are transmitted in a Mendelian manner within a family. To determine the chromosome origin of the variants, we studied their linkage to markers of various chromosomes. Linkage analysis of one pedigree segregating two polymorphisms shows that the 2.6-kilobase (kb) BamHI and 2.6-kb TaqI fragments are linked to each other and to the HLA loci on chromosome 6. Data from another family shows that 2.8-kb TaqI, 4.0-kb TaqI, and 1.3-kb BamHI polymorphic fragments are linked and are probably near the Fy locus on chromosome 1. By dot blot analysis, we determined that the relative amount of these sequences in the genome is not measurably different between unrelated individuals. Thus, the polymorphisms represent changes in homologous 308 sequences on specific chromosomes and can be used as chromosome-specific markers. Linkage studies using polymorphisms of repeated sequences will be most useful within a kindred, especially from an inbred population, because polymorphic repeats of the same restriction size may be heterogeneous in origin.  相似文献   

6.
Estimates of population size are critical for conservation and management, but accurate estimates are difficult to obtain for many species. Noninvasive genetic methods are increasingly used to estimate population size, particularly in elusive species such as large carnivores, which are difficult to count by most other methods. In most such studies, genotypes are treated simply as unique individual identifiers. Here, we develop a new estimator of population size based on pedigree reconstruction. The estimator accounts for individuals that were directly sampled, individuals that were not sampled but whose genotype could be inferred by pedigree reconstruction, and individuals that were not detected by either of these methods. Monte Carlo simulations show that the population estimate is unbiased and precise if sampling is of sufficient intensity and duration. Simulations also identified sampling conditions that can cause the method to overestimate or underestimate true population size; we present and discuss methods to correct these potential biases. The method detected 2–21% more individuals than were directly sampled across a broad range of simulated sampling schemes. Genotypes are more than unique identifiers, and the information about relationships in a set of genotypes can improve estimates of population size.  相似文献   

7.
We have determined the nucleotide sequence of a class II yeast transposon (Ty 1-17) which is found just centromere-distal to the LEU2 structural gene on chromosome III of Saccharomyces cerevisiae. The complete element is 5961 bp long and is bounded by two identical, directly repeated, delta sequences of 332 bp each. The sequence organization indicates that Ty 1-17 is a retrotransposon, like the class I elements characterized previously. It contains two long open reading-frames, TyA (439 amino acids) and TyB (1349 amino acids). In this paper, the sequences of the two classes of yeast transposon are compared with one another and with analogous elements, such as retroviral proviruses, cauliflower mosaic virus and copia sequences. Features of the Ty 1-17 sequence which may be important to its mechanism of transposition and its genetic action are discussed.  相似文献   

8.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

9.
10.
Inheritance of mitochondrial DNA (mtDNA) in Holstein cattle was characterized by pedigree analysis of nucleotide sequence variation. mtDNA was purified from leukocytes of 174 individuals representing 35 independent maternal lineages, and analyzed for nucleotide sequence variation by characterization of restriction fragment length polymorphism and direct sequence determination. These data revealed 11 maternal lineages in which leukocytes from some individuals seemingly were homoplasmic for the reference mtDNA sequence at nucleotide 364, whereas those from other individuals were homoplasmic for a sequence variant at this position. Both alternative alleles were detected in all branches of these 11 lineages, suggesting that mutation at nucleotide 364 and fixation of the variant sequence occurred frequently in independent events. Thirteen instances were detected of mother-daughter pairs in which leukocytes of each of the two animals seemingly were homoplasmic for a different allele at nucleotide 364, demonstrating the bovine mitochondrial genome can be replaced completely by a nucleotide sequence variant within a single generation. The two alternative sequences seemingly arose de novo at similar frequency, ruling out replicative advantage or other selective bias as the explanation for rapid fixation of mutations at nucleotide 364. Another instance of intralineage sequence variation was detected at nucleotide 5602. This variation was detected in only one of the lineages examined, and evidently arose within three generations.  相似文献   

11.
Even though parasitic flatworms are one of the most species‐rich groups of hermaphroditic organisms, we know virtually nothing of their mating systems (selfing or kin‐mating rates) in nature. Hence, we lack an understanding of the role of inbreeding in parasite evolution. The natural mating systems of parasitic flatworms have remained elusive due to the inherent difficulty in generating progeny‐array data in many parasite systems. New developments in pedigree reconstruction allow direct inference of realized selfing rates in nature by simply using a sample of genotyped individuals. We built upon this advancement by utilizing the closed mating systems, that is, individual hosts, of endoparasites. In particular, we created a novel means to use pedigree reconstruction data to estimate potential kin‐mating rates. With data from natural populations of a tapeworm, we demonstrated how our newly developed methods can be used to test for cosibling transmission and inbreeding depression. We then showed how independent estimates of the two mating system components, selfing and kin‐mating rates, account for the observed levels of inbreeding in the populations. Thus, our results suggest that these natural parasite populations are in inbreeding equilibrium. Pedigree reconstruction analyses along with the new companion methods we developed will be broadly applicable across a myriad of parasite species. As such, we foresee that a new frontier will emerge wherein the diverse life histories of flatworm parasites could be utilized in comparative evolutionary studies to broadly address ecological factors or life history traits that drive mating systems and hence inbreeding in natural populations.  相似文献   

12.
Can we find the family trees, or pedigrees, that relate the haplotypes of a group of individuals? Collecting the genealogical information for how individuals are related is a very time-consuming and expensive process. Methods for automating the construction of pedigrees could stream-line this process. While constructing single-generation families is relatively easy given whole genome data, reconstructing multi-generational, possibly inbred, pedigrees is much more challenging. This article addresses the important question of reconstructing monogamous, regular pedigrees, where pedigrees are regular when individuals mate only with other individuals at the same generation. This article introduces two multi-generational pedigree reconstruction methods: one for inbreeding relationships and one for outbreeding relationships. In contrast to previous methods that focused on the independent estimation of relationship distances between every pair of typed individuals, here we present methods that aim at the reconstruction of the entire pedigree. We show that both our methods out-perform the state-of-the-art and that the outbreeding method is capable of reconstructing pedigrees at least six generations back in time with high accuracy. The two programs are available at http://cop.icsi.berkeley.edu/cop/.  相似文献   

13.
The increasing availability of genomic tools improves our ability to investigate the patterns of genetic diversity and relatedness among individuals. The pedigrees of many apple cultivars are completely unknown, often reducing the efficiency of breeding programs. Using a multilocus simple sequence repeat dataset, we applied a novel multi-generation pedigree-network reconstruction procedure based on the software FRANz in a Malus × domestica collection (101 cultivated and 22 wild apples) with partially known pedigree relationships. The procedure produced 78 parent–offspring relationships organized into three networks and showed high power for detecting real pedigree links (98.5 %) and a low false-positive rate (9.0 %). The largest reconstructed pedigree network spanned four generations and involved 65 cultivars. The availability of detailed pedigree connections confirmed that recent genealogical relationships affect population genetic structure in apple. Finally, our analysis enabled us to confirm or discard several pedigrees known only anecdotically, among which the cultivar Grimes Golden was validated as a parent of the widely grown cultivar Golden Delicious. The pedigree reconstruction protocol here described will be of broad applicability to other collections and crop species.  相似文献   

14.
We compared the nucleotide sequences of 3 yeast invertase genes in regions where the homology is better than 90%. In the noncoding region 40 gaps of 1-61 bases were found. This is about half as much as the nucleotide substitutions in the same sequences. We grouped the gaps into 5 categories by their length and the characteristics of their sequences. Group I gaps are about 20 nucleotides long and are flanked by repeated sequence of 6 bases which may trigger the deletion of one of the repeats and the sequence between the repeats. Group II gaps are characterized by a small repeated sequence which is missing in one of the invertase genes. Gaps which occur in sequences exclusively made up of one of the 4 bases are summarized in group III. The 4 gaps in group IV do not show any of these sequence characteristics and they are all just one base long. A 61 nucleotide sequence found in only one of the invertase genes seems to be of complex origin. We conclude that small repeated sequences or monotonous sequences are prone to deletion or insertion mutations.  相似文献   

15.
Sequencing by hybridization is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, even with a sequencing chip containing all 4(9) 9-mers and assuming no hybridization errors, only about 400-bases-long sequences can be reconstructed unambiguously. Drmanac et al. (1989) suggested sequencing long DNA targets by obtaining spectra of many short overlapping fragments of the target, inferring their relative positions along the target, and then computing spectra of subfragments that are short enough to be uniquely recoverable. Drmanac et al. do not treat the realistic case of errors in the hybridization process. In this paper, we study the effect of such errors. We show that the probability of ambiguous reconstruction in the presence of (false negative) errors is close to the probability in the errorless case. More precisely, the ratio between these probabilities is 1 + O(p = (1 - p)(4). 1 = d) where d is the average length of subfragments, and p is the probability of a false negative. We also obtain lower and upper bounds for the probability of unambiguous reconstruction based on an errorless spectrum. For realistic chip sizes, these bounds are tighter than those given by Arratia et al. (1996). Finally, we report results on simulations with real DNA sequences, showing that even in the presence of 50% false negative errors, a target of cosmid length can be recovered with less than 0.1% miscalled bases.  相似文献   

16.
提取(量化)特征是DNA甲基化状态预测中的一个关键步骤,然而不同的方法所使用的特征并不相同,特征量化的具体过程计算繁琐。本文集成文献中的重要特征,设计并实现了DNA序列的特征提取软件工具。该软件封装了特征的计算过程,可以方便地批量计算目标序列的相关特征,为后续的数据分析和挖掘提供便利。  相似文献   

17.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

18.
A protein sequence database (PFDB) containing about 11,000 entries is available for Macintosh computers. The PFDB can be easily updated by importing sequences from the PIR collection through the internet. The most important feature of the database is its organization in families of closely related sequences, each family being characterized by its average dipeptide composition [Petrilli (1993), Comput. Appl. Biosci. 2, 89–93]. This allows one to perform a rapid and sensitive protein similarity search by comparing the precalculated family dipeptide composition with that of the query sequence by a linear correlation coefficient. An example of an application in which a new protein was classsified by using a sequence of a fragment just 19 residues long is reported.  相似文献   

19.
Control region sequence, an mtDNA marker, was usually used in phylogenesis analysis in species level or genetic structure study among populations. In this study, enlightened by its character of maternal heredity in vertebrates, we used control region sequence as a matrilineage marker for Elliot's pheasant (Syrmaticus ellioti) of Ningbo Zoo population. In Ningbo Zoo, 36 individuals of Elliot's pheasant were descendants from three female founders introduced in 1988. Three control region haplotypes (Ha, Hb, Hc) were identified by six variable nucleotide positions among the control region sequences over 36 individuals. The number of haplotypes was accorded with the number of female founders. Total 20 individuals (C04, C06, C08-11, C14, C20, C21, C23-29, C32, C34-36) shared haplotype a, while 12 individuals (C01, C05, C07, C12, C13, C16-19, C22, C30, C33) shared haplotype b and 4 individuals (C02, C03, C15, C31) shared haplotype c. Those individuals sharing the same haplotype were offspring from one female founder. In other words, there were three maternal lineages and the simple relationship among individuals was indicated. As a result, it seemed that the control region sequence was a useful marker for identification of matrilineage in this study. Meanwhile, the matrilineage information may be compensatory data if there were no any pedigree records in captive species for breeding management.  相似文献   

20.
Exobiology, the study of the origin, evolution and distribution of life (including life on earth) within the context of cosmic evolution, is being given a remarkable boost by genome sequencing projects, which are now making the evolutionary histories of protein families routinely available. These histories comprise a multiple alignment for their protein sequences and the corresponding DNA sequences, an evolutionary tree showing the pedigree of these sequences, and reconstructed ancestral sequences for each node in the tree. In a post-genomic world having genomic sequences from an unlimited number of organisms, these histories will be used to connect structure, chemical reactivity, and physiological function to these families. This paper describes several “post-genomic” tools that exploit these evolutionary histories. They can be used to confirm or deny long distance homology between two protein families, identify proteins within a family that have new functions, and identify specific in vitro properties of the protein that are important for its physiological role. Evolution-based data structures for organizing large sequence databases are also described.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号